<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Time-Embedding Travelers at WiC-ITA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Periti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haim Dubossarsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Queen Mary University of London</institution>
          ,
          <addr-line>England</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milan</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The WiC-ITA shared task aims to determine whether a word appearing in two distinct sentences carries the same meaning. The task consists of two subtasks: binary classification (Subtask 1) and ranking (Subtask 2). Each subtask is designed in both a monolingual (Italian) and multilingual (Italian-English) setting. In this report, we present the results of our participation in WiC-ITA. In our experiments, we leverage the condition number of the cosine similarity matrix between XLM-R embeddings and demonstrate competitive performance, ranking among the top positions in both the monolingual and cross-lingual setting. Our results indicate that semantic information is present not only in the last layers but also across the middle layers of XLM-R and throughout the entire architecture. This suggests potential avenues for future research to explore the use of the complete set of embeddings, rather than solely relying on the embeddings extracted from the last layer(s).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Word-in-Context</kwd>
        <kwd>Contextualized Embeddings</kwd>
        <kwd>Condition Number</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>novel benchmark for evaluating WiC for both a
monolingual (L) setting in Italian and a cross-lingual (XL) setting</p>
      <sec id="sec-1-1">
        <title>In the last decade, the use of Word Embedding tech- from Italian to English (Cassotti et al., 2023 [11]; Lai et</title>
        <p>
          niques has improved the modeling
          <xref ref-type="bibr" rid="ref30">of lexical semantics. al., 2023</xref>
          [12]). Inspired by the previous work, WiC-ITA
Initially, static embedding models have been employed challenges its participants with two sub-tasks:
to encode the dominant semantics of a word into a
single vector representation, i.e., word embedding (Mikolov 1. Binary Classification: to establish if a target word
et al., 2013 [1]). However, understanding the meaning  occurring in a pair of sentences ⟨1, 2⟩ has
of words in their specific contexts is a crucial task for the same meaning or not (Subtask 1);
modeling language efectively. This motivated the recent 2. Ranking: to rank the pair of sentences ⟨1, 2⟩
eforts to create contextualized models capable of gener- by the degree of similarity of the target word’s
ating diferent vector representations according to the meaning (Subtask 2).
context in which the words occur (Devlin et al., 2019 [2]).
        </p>
        <p>
          Despite the growing popularity of contextualized
embeddings in research fields such as Word Sense
Disambiguation or Lexical Semantic Shift Detection (Scarlini
et al., 2020 [3]; Montanelli and
          <xref ref-type="bibr" rid="ref29">Periti, 2023</xref>
          [4]),
Wordin-Context (WiC) benchmarks that specifically focus on
the dynamic of word semantics are relatively recent. The
ifrst WiC benchmarks were limited to English (Pilehvar
et al., 2019 [5]; Loureiro et al., 2022 [6]). Their success
prompted the development of new WiC benchmarks
to cover a wider scope of languages (Raganato et al.,
2020 [7]; Liu et al., 2021 [8]), test the transfer learning
ability in cross-lingual settings (Martelli et al., 2021 [9]),
and evaluate graded word similarity in context
(Armendariz et al., 2020 [10]).
        </p>
        <sec id="sec-1-1-1">
          <title>The WiC-ITA shared task at EVALITA 2023 provides a</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background and motivation</title>
      <p>embedding model to compute a continuous similarity
score. This score indicates the extent to which the target
BERT is a powerful contextualized model that leverages  carries the same meaning in the sentences 1 and 2.
the Transformer encoder to capture the contextual se- More precisely, consider a sentence  that contains
mantics of words (Devlin et al., 2019 [2]; Vaswani et al., the word . Given a contextualized model  , a vector
2017 [13]). Typically, the success of BERT is attributed representation of  is extracted from every layer of the
to its multi-layer (e.g., 12) and multi-head (e.g., 12) self- model  . This way, the word  in the sentence  is
assoattention blocks. However, most of the SOTA work only ciated with a set of contextualized embeddings denoted
uses the outputs of the final layer(s) (i.e., word embed- by . It’s worth noting that  ∈ R× , where  is the
dings) as input for solving NLP tasks, while ignoring the number of encoders of the model  (e.g., 12) and  is the
output of the earlier layers. As a result, the role of difer- dimension of the embeddings (e.g., 768). As a result, we
ent embedding layers for representing the semantics of denote as 1 and 2 the contextualized embeddings of
word occurrences is still unclear. Recently, a limited num-  extracted from the sentences 1 and 2, respectively.
ber of studies have been conducted to explore the nature In order to evaluate the similarity of the word  in
and characteristics of the BERT embeddings. In particu- the contexts 1 and 2, we collect the pairwise cosine
lar,Jawahar et al. (2019) [14] indicate that BERT’s lower similarities between 1 and 2. We denote as  the
layers capture surface features pertaining to phrase-level similarity matrix between 1 and 2 (see Figure 3 as an
information, middle layers capture syntactic features, example 1) . Our hypothesis is that taking into account
and higher layers capture semantic features. Devlin et information from all layers at once will provide a richer
al. (2019) [2] report that combining the last four hid- and more comprehensive rapport of the nature of usage
den layers could be beneficial for mainstream tasks such similarities of a word between the two sentences. We
as Named Entitiy Recognition. Ethayarajh (2019) [15] hypothesize that because many layers are known to
capdemonstrates that the geometry of the embedding space ture relevant semantic information, we should consider
exhibits anisotropy, meaning that the embeddings of all as many of them as possible together, as they may
conlayers occupy a narrow cone within the vector space. tain more comprehensive information than a single layer
Other work involves probing tasks, as proposed in He- comparison approach.
witt et al. (2019) [16]. These tasks consist of training an In order to tap into this pool of similarity scores
enauxiliary classifier on top of a model, where the contextu- coded within  (that contains 144 times more information
alized embeddings serve as features to predict syntactic than a single layer) we use a measure called the condition
(e.g., part-of-speech tags) and semantic (e.g., word rela- number. The condition number of a matrix, which was
tions) properties of words. The idea is that if the auxiliary already successfully applied in other domains in NLP
classifier accurately predicts a linguistic property, we can (Dubossarsky et al., 2020 [19]), provides us with a unified
assume that the property is encoded in the tested model. measure that takes into account the many similarities</p>
      <sec id="sec-2-1">
        <title>In line with this work, Coenen et al. (2019) [17] investi- scores between the representations of  of the pair 1</title>
        <p>gate the capability of word sense prediction and indicate and 2 throughout the diferent layers.
that earlier-layer embeddings contain significantly more Originally, the condition number of a matrix was
semantic information than conventionally believed. used to measure its sensitivity to perturbations, or small</p>
        <p>
          Thus, our experiments are motivated by the latter find- changes, in its input. A large condition number indicates
ing and inspired by linguistic research that highlights that the matrix is ill-conditioned, meaning it is sensitive
the influential role of morphology and syntax in shaping to small perturbations. On the other hand, a small
condiword meanin
          <xref ref-type="bibr" rid="ref13">gs (Wysocki and Jenkins, 1987</xref>
          [18]). In this tion number indicates that the matrix is well-conditioned,
paper, we challenge the hypothesis that word meanings meaning that small changes will not afect it much.
should be investigated by considering the full output of In the setting of the WiC task, we interpret the
condipre-trained models to encompass not only semantic fea- tion number of a similarity matrix as associated with the
tures of the last layers but also the intricate interplay of stability of meaning between the two sentences. Higher
semantic, surface, and syntactic features present in the similarity scores in  overall indicate two similar word
middle and lower layers of the contextualized models. usage and are expected to produce lower (and better)
condition number. On the other hand, less similar and more
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System overview</title>
      <sec id="sec-3-1">
        <title>Our system is a simple threshold-based classifier based</title>
        <p>on the similarity of two sets of word vectors. In
particular, given a pair of sentences ⟨1, 2⟩, and a target word
, we use the output embeddings of a contextualized
varied similarity scores indicate more unrelated usages with  ∈ 1, ..., . Additionally, we compute the
coresulting in a higher (and worse) condition number. sine similarity CS  between the word embeddings</p>
      </sec>
      <sec id="sec-3-2">
        <title>The condition number of a matrix is defined as the obtained by averaging the last four embeddings of 1</title>
        <p>multiplication of the matrix’s norm by the norm of its and 2, respectively.
reciprocal (i.e., the inverse of the matrix). The norm In line with the WiC-ITA guidelines, we compute the
could be Euclidean norm, Max norm, Frobenius norm, etc. Spearman correlation between the estimated similarity</p>
      </sec>
      <sec id="sec-3-3">
        <title>In our experiments, we calculate the condition number scores and the gold answers. This serves as the evaluation</title>
        <p>(COND) of the similarity matrix  using the Frobenius metric for Subtask 2.
norm as follows: In Subtask1, our binary predictions are derived from
the similarity scores obtained in Subtask 2. We employ a
COND () = ‖‖ · ‖ − 1‖ threshold-based classifier, selecting the threshold value
that optimizes the F1 score on the set of sentence pairs
used as training set.</p>
      </sec>
      <sec id="sec-3-4">
        <title>When we compute the condition number from the</title>
        <p>similarity matrix , we assess the degree of semantic
similarity  of a word  in each pair ⟨1, 2⟩ as  =
COND() 2. 4. Experimental setup</p>
      </sec>
      <sec id="sec-3-5">
        <title>Furthermore, we also investigate the similarity</title>
        <p>by considering only a subset of . We test COND , In this task, we compared two diferent
contextual</p>
        <sec id="sec-3-5-1">
          <title>COND , and COND based on the similarities collected ized multilingual models, namely mBERT (Devlin et al.,</title>
          <p>from the first, middle, and last four layers of the model 2019 [2]), and XLM-R (Conneau et al., 2020 [20]). We
 , respectively. use the Transformers library by HuggingFace to extract</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>For the sake of comparison, we set as reference base- contextual word embeddings from mBERT and XLM-R</title>
          <p>
            lines the cosine similarity (CS) of the  embeddings ex- models without performing any fine-tuning stage
            <xref ref-type="bibr" rid="ref15 ref18 ref24">(Wolf
tracted from all the layers of the model  individually, et al., 2020 [21])</xref>
            . We use the base versions, with 12 layers
meaning that we compute  diferent CS scores as and 768 hidden dimensions: bert-base-multilingual-cased,
1[] · 2[] and xlm-roberta-base, respectively.
          </p>
          <p>CS(1[], 2[]) = , Given a target word  and a pair ⟨1, 2⟩. The
acqui‖1[]‖‖2[]‖ sition of contextual embeddings is done by feeding the
models with the sentences 1 and 2 individually. For
every sentence, we extract the token embedding for the
target word  from each layer of the model. Due to the</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>2For ease of interpretation, in our experiment, we utilized the</title>
        <p>COND metric. We chose to associate smaller condition numbers
with unrelated usages (annotated as 1), while larger numbers with
identical usages (annotated as 4).
Measures
COND
COND
COND
CS10
CS11
CS</p>
        <p>CS9
CS8
CS7
CS6
CS12
CS5</p>
        <p>CS4
COND</p>
        <p>CS3
CS2
CS1
byte-pair input encoding scheme employed by BERT-like Train sets.
models, some tokens may not correspond to complete Motivated by the superior results achieved during the
words but rather to word pieces. In such cases, when development phase, we relied on XLM-R for our final
a word is split into multiple tokens, we build a single submissions. In particular, we submit the predictions
word embedding by averaging the embeddings of its con- obtained with COND, COND , COND . However, in
stituent word pieces. Table 2, it is worth noting that COND also emerged as the</p>
        <p>Finally, to assess the graded word similarity in the leading measure for the mBERT model, proving its
concontext of a pair of sentences, we calculate similarity sistency. Moreover, we note that for the WiC-ITA task,
scores between the contextualized embeddings of the the embeddings from the last layer of both XLM-R and
target word under consideration (See Section 3). BERT, as well the embeddings derived by the aggregation
of the last four layers, are not as efective as those from
other layers. For instance, it is interesting to observe that
5. Experimental results layer 8 seems to be efective for Subtask1.</p>
      </sec>
      <sec id="sec-3-7">
        <title>In the final evaluation leaderboard for the WiC-ITA</title>
        <p>In our submissions, we rely on XLM-R as it proved to task, we ranked 2nd for L-Subtask1, 1st for XL-Subtask1,
be more efective than mBERT. To maximize the perfor- 2nd for L-Subtask2, and 1st for XL-Subtask2. The
leadermance of our system, we leverage the available train board is reported in Table 3.
and dev set as a whole. In particular, we randomly gen- Our final results at WiC-ITa demonstrate that COND
erate 100 diferent train-test splits, with sizes of 2000 efectively captures semantic features of word meanings
and 1305 respectively (equivalent to 60% and 40% of the and can be successfully applied to tasks like WiC. Based
full dataset). We conduct cross-validation on these 100 on our development results, we assert that COND
consplits to validate the use of COND for Subtask2. Addi- sistently outperforms the CS measure computed over
tionally, we leverage cross-validation to determine the individual contextualized embeddings, for Subtask 1 and
optimal threshold for Subtask1, meaning that we rely on 2 in both in L and XL setting. This is particularly
interthe average of the 100 best thresholds obtained during esting considering that CS is commonly utilized in NLP
cross-validation. The average scores of Spearman cor- tasks to capture contextual semantics in contextualized
relation, Precision, Recall, and F1 score are presented embeddings.
in Table 1 for each tested measure. For Subtask1 and 2 Finally, COND consistently achieves good results
and for both the L and XL setting, our three submissions by considering medium layers alone. These results are
correspond to the top three measures based on the F1 in line with the findings of Coenen et al. (2019) [ 17],
score and Spearman correlation, respectively (i.e., COND, and suggest that the middle layers of BERT-like models
COND , COND). contain valuable information for efectively representing</p>
        <p>For the sake of comparison, Table 2 presents the pre- meaning. Therefore, future work should explore the
apliminary performance achieved during the development plication of COND for WiC and other related NLP tasks
phase with both XLM-R and mBERT over the Dev and such as Lexical Semantic Change Detection (Montanelli</p>
        <p>1–6
Preliminary performance achieved during the development phase with both XLM-R and mBERT over the Dev and Train sets.
We report in bold the best result for each metric, model, and data set.</p>
        <p>Evaluation Phase
Subtask1</p>
        <p>Subtask2</p>
        <p>Teams
BERT 4EVER
BERT 4EVER
BERT 4EVER</p>
        <p>LG
extremITA
extremITA</p>
        <p>Baseline
The Time-Embedding Travelers
The Time-Embedding Travelers
The Time-Embedding Travelers</p>
        <p>Run
run1
run2
run3</p>
        <p>LG
COND
COND</p>
        <p>COND
camoscio lora
it5</p>
        <p>L-WiC</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <sec id="sec-4-1">
        <title>Our experiments for the WiC-ITA shared task ranked 2nd for L-Subtask1, 1st for XL-Subtask1, 2nd for L-Subtask2, and 1st for XL-Subtask2. In our submissions, we use the</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This work has in part been funded by the project Towards</title>
      </sec>
      <sec id="sec-5-2">
        <title>Computational Lexical Semantic Change Detection sup</title>
        <p>condition number of the cosine similarity matrix between
ported by the Swedish Research Council (2019–2022;
con</p>
      </sec>
      <sec id="sec-5-3">
        <title>XLM-R embeddings extracted from diferent layers. Our</title>
        <p>tract 2018-01184), and in part by the research program
results support our initial hypothesis that leveraging all</p>
      </sec>
      <sec id="sec-5-4">
        <title>Change is Key! supported by Riksbankens Jubileumsfond the information provided by the pre-trained model can (under reference number M21-0021).</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>EVALITA</surname>
          </string-name>
          , CEUR.org, Parma, Italy,
          <year>2023</year>
          . [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          , Efi- [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          , J. Uszkoreit,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Scottsdale</surname>
          </string-name>
          , Arizona,
          <year>2013</year>
          . volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>CAI</surname>
          </string-name>
          ,
          <year>2017</year>
          . [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT: [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jawahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          , What Does BERT
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Language</given-names>
            <surname>Understanding</surname>
          </string-name>
          ,
          <source>in: Proc. of NAACL-HLT</source>
          ,
          <string-name>
            <surname>of</surname>
            <given-names>ACL</given-names>
          </string-name>
          , ACL, Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>3651</fpage>
          -
          <lpage>3657</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          , How Contextual are Contextualized [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Scarlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <surname>With More Con- Word Representations</surname>
          </string-name>
          ? Comparing the Geometry
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>texts Comes Better</surname>
          </string-name>
          <article-title>Performance: Contextualized of BERT, ELMo,</article-title>
          and GPT-2 Embeddings, in: Proc.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Sense Embeddings for All-Round Word Sense Dis- of EMNLP-IJCNLP, ACL</article-title>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China,
          <year>2019</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          ambiguation, in
          <source>: Proc. of EMNLP</source>
          , ACL, Online, pp.
          <fpage>55</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>3528</fpage>
          -
          <lpage>3539</lpage>
          . [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Designing and Interpreting [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Periti</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Con- Probes with Control Tasks</article-title>
          ,
          <source>in: Proc. of EMNLP-</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>textualised Semantic Shift Detection</source>
          ,
          <year>2023</year>
          . IJCNLP, ACL,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China,
          <year>2019</year>
          , pp.
          <fpage>2733</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>arXiv:2304.01666</source>
          . 2743. [5]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Pilehvar</surname>
          </string-name>
          , J. Camacho-Collados, WiC: the [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Coenen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Reif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pearce</surname>
          </string-name>
          , F. Vié-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>NAACL-HLT</surname>
            ,
            <given-names>ACL</given-names>
          </string-name>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , Hook,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          pp.
          <fpage>1267</fpage>
          -
          <lpage>1273</lpage>
          . [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wysocki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Jenkins</surname>
          </string-name>
          , Deriving word meanings [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. D'Souza</surname>
            ,
            <given-names>A. N.</given-names>
          </string-name>
          <string-name>
            <surname>Muhajab</surname>
            ,
            <given-names>I. A.</given-names>
          </string-name>
          <string-name>
            <surname>White</surname>
          </string-name>
          , through morphological generalization, Reading
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Espinosa-Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Research Quarterly (
          <year>1987</year>
          )
          <fpage>66</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <source>TempoWiC: An Evaluation</source>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Vulić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reichart</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Korho-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>public of Korea</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3353</fpage>
          -
          <lpage>3359</lpage>
          . Measures, in
          <source>: Proceedings of the 2020 Conference</source>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Raganato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          , M.
          <source>T. on Empirical Methods in Natural Language Process-</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Pilehvar</surname>
          </string-name>
          ,
          <article-title>XL-WiC: A Multilingual Benchmark for ing (EMNLP</article-title>
          ),
          <year>2020</year>
          , pp.
          <fpage>2377</fpage>
          -
          <lpage>2390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Evaluating</given-names>
            <surname>Semantic Contextualization</surname>
          </string-name>
          , in: Proc. [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          , V. Chaud-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>of</surname>
            <given-names>EMNLP</given-names>
          </string-name>
          , ACL, Online,
          <year>2020</year>
          , pp.
          <fpage>7193</fpage>
          -
          <lpage>7206</lpage>
          . hary, G. Wenzek,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          , [8]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Ponti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McCarthy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Vulić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ko- L. Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , Unsupervised Cross-
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>rhonen</surname>
          </string-name>
          ,
          <source>AM2iCo: Evaluating Word Meaning in lingual Representation Learning at Scale</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Context across Low-Resource Languages with Ad-</article-title>
          arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          versarial Examples,
          <source>in: Proc. of EMNLP</source>
          , ACL, Punta [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , C. De-
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Cana</surname>
            ,
            <given-names>Dominican</given-names>
          </string-name>
          <string-name>
            <surname>Republic</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>7151</fpage>
          -
          <lpage>7162</lpage>
          . langue, A. Moi,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          , M. Fun[9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Martelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kalach</surname>
          </string-name>
          , G. Tola, R. Navigli, SemEval- towicz, J. Davison,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>2021 Task</source>
          <volume>2</volume>
          : Multilingual and
          <string-name>
            <surname>Cross-lingual Word- Y. Jernite</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Plu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>T. Le</given-names>
          </string-name>
          <string-name>
            <surname>Scao</surname>
          </string-name>
          , S. Gugger,
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>of</surname>
            <given-names>SemEval</given-names>
          </string-name>
          , ACL, Online,
          <year>2021</year>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>36</lpage>
          .
          <article-title>of-the-Art Natural Language Processing</article-title>
          , in: Proc. [10]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Armendariz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Purver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ulčar</surname>
          </string-name>
          , S. Pollak,
          <string-name>
            <surname>of</surname>
            <given-names>EMNLP</given-names>
          </string-name>
          , ACL, Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Granroth-Wilding</surname>
          </string-name>
          , CoSimLex: A [22]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          , Computa-
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <article-title>Resource for Evaluating Graded Word Similarity in tional modeling of semantic change</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Context</surname>
          </string-name>
          , in
          <source>: Proc. of LREC</source>
          , ELRA, Marseille, France, arXiv:
          <fpage>2304</fpage>
          .
          <fpage>06337</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>5878</fpage>
          -
          <lpage>5886</lpage>
          . [23]
          <string-name>
            <given-names>F.</given-names>
            <surname>Periti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruskov</surname>
          </string-name>
          , What [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cassotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Siciliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Gatto, is Done is Done: an Incremental Approach to Se-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>WiC-ITA at EVALITA2023: Overview mantic Shift Detection</article-title>
          ,
          <source>in: Proceedings of the 3rd</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>of the EVALITA2023 Word-in-Context for</article-title>
          ITAlian Workshop on Computational Approaches to His-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          2023. tational Linguistics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>[</lpage>
          12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          , R. Sprug- 43. URL: https://aclanthology.org/
          <year>2022</year>
          .lchange-
          <volume>1</volume>
          .4.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>noli</surname>
            , G. Venturi,
            <given-names>EVALITA</given-names>
          </string-name>
          <year>2023</year>
          :
          <article-title>Overview of the doi</article-title>
          :
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .lchange-
          <volume>1</volume>
          .4.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>8th Evaluation Campaign of Natural Language Pro-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>