<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not Outperform SGNS on Semantic Change Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Severin Laicher</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gioia Baldissin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrique Castañeda Dominik Schlechtweg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabine Schulte im Walde</string-name>
          <email>schulte@ims.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Natural Language Processing, University of Stuttgart</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit Average Pairwise Distance of token-based BERT embeddings between time points and rank 5 (of 8) in the official ranking with an accuracy of :72. While we tune parameters on the English data set of SemEval-2020 Task 1 and reach high performance, this does not translate to the Italian DIACR-Ita data set. Our results show that we do not manage to find robust ways to exploit BERT embeddings in lexical semantic change detection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Lexical Semantic Change (LSC) Detection has
drawn increasing attention in the past years
        <xref ref-type="bibr" rid="ref10 ref21">(Kutuzov et al., 2018; Tahmasebi et al., 2018)</xref>
        . Recently,
SemEval-2020 Task 1 provided a multi-lingual
evaluation framework to compare the variety of
proposed model architectures
        <xref ref-type="bibr" rid="ref19 ref7">(Schlechtweg et al.,
2020)</xref>
        . The DIACR-Ita shared task extends parts
of this framework to Italian by providing an Italian
data set for SemEval’s binary subtask
        <xref ref-type="bibr" rid="ref2 ref2 ref3 ref3">(Basile et
al., 2020a; Basile et al., 2020b)</xref>
        . We present the
results of our participation in the DIACR-Ita shared
task on lexical semantic change for Italian. We
exploit Average Pairwise Distance of token-based
BERT embeddings (Devlin et al., 2019) between
time points and rank 5 (of 8) in the official ranking
with an accuracy of :72. While we tune parameters
on the English data set of SemEval-2020 Task 1
and reach high performance, this does not transfer
to the Italian DIACR-Ita data set. Our results show
that we do not manage to find robust ways to
exploit BERT embeddings in lexical semantic change
detection.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Most existing approaches for LSC detection are
type-based
        <xref ref-type="bibr" rid="ref18 ref20">(Schlechtweg et al., 2019; Shoemark
et al., 2019)</xref>
        . This means that not every word
occurrence is considered individually (token-based)
but a general vector representation that summarizes
every occurrence of a word (including ambiguous
words) is created. The results of the SemEval-2020
Task 1
        <xref ref-type="bibr" rid="ref11 ref19 ref7">(Martinc et al., 2020; Schlechtweg et al.,
2020)</xref>
        showed that type-based approaches
        <xref ref-type="bibr" rid="ref1">(Pražák
et al., 2020b; Asgari et al., 2020)</xref>
        achieved better
results than token-based approaches
        <xref ref-type="bibr" rid="ref1 ref11 ref13 ref14 ref5 ref7 ref8 ref9">(Beck, 2020;
Kutuzov and Giulianelli, 2020a)</xref>
        . This is
somewhat surprising since in the last years
contextualized token-based approaches have achieved
significant improvements over the static type-based
approaches in several NLP tasks
        <xref ref-type="bibr" rid="ref4">(Ethayarajh, 2019)</xref>
        .
Schlechtweg et al. (2020) suggest a range of
possible reasons for this: (i) Contextual embeddings
are new and lack proper usage conventions. (ii)
They are pre-trained and may thus carry additional,
and possibly irrelevant, information. (iii) The
context of word uses in the SemEval data set was too
narrow (one sentence). (iv) The SemEval corpora
were lemmatized, while token-based models
usually take the raw sentence as input. In the
DIACRIta challenge (iii) and (iv) are irrelevant because
raw corpora with sufficient context are made
available to participants. We tried to tackle (i) by
excessively tuning parameters and system modules on
the English SemEval data set. (ii) can be tackled by
fine-tuning BERT on the target corpora. However,
our experiments on the English SemEval data set
show that exceptionally high performances can be
reached even without fine-tuning.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental setup</title>
      <p>
        The DIACR-Ita task definition is taken from
SemEval-2020 Task 1 Subtask 1 (binary change
detection): Given a list of target words and a
diacronic corpus pair C1 and C2, the task is to identify
the target words which have changed their
meanings between the respective time periods t1 and t2
        <xref ref-type="bibr" rid="ref19 ref2 ref3 ref7">(Basile et al., 2020a; Schlechtweg et al., 2020)</xref>
        .1
C1 and C2 have been extracted from Italian
newspapers and books. Target words which have changed
their meaning are labeled with the value ‘1’, the
remaining target words are labeled with ‘0’. Gold
data for the 18 target words is semi-automatically
generated from Italian online dictionaries.
According to the gold data, 6 of the 18 target words are
subject to semantic change between t1 and t2. This
gold data was only made public after the
evaluation phase. During the evaluation phase each team
was allowed to submit up to 4 predictions for the
full list of target words, which were scored using
classification accuracy between the predicted labels
and the gold data. The final competition ranking
compares only the highest of the scores achieved
by each team.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>System Overview</title>
      <p>Our model uses BERT to create token vectors and
the average pairwise distance to compare the token
vectors from two times. The following chapter
presents our model, how we have trained it and
how we have chosen our submissions.
4.1</p>
      <sec id="sec-4-1">
        <title>BERT</title>
        <p>
          In 2018 Google has released a pre-trained model
that ran over Wikipedia and books of different
genres (Devlin et al., 2019): BERT (Bidirectional
Encoder Representations from Transformer) is a
language representation model, designed to find
representations for text by analysing its left and right
contexts (Devlin et al., 2019). Peters et al. (2018)
show that contextual word representations derived
from pre-trained bidirectional language models like
BERT and ELMo yield significant improvements
to the state-of-the-art for a wide range of NLP tasks.
BERT can be used to analyse the semantics of
individual words, by creating contextualized word
representations, vectors that are sensitive to the
1The time periods t1 and t2 were not disclosed to
participants.
context in which they appear
          <xref ref-type="bibr" rid="ref4">(Ethayarajh, 2019)</xref>
          .
BERT can either create one vector for an input
sentence (sentence embedding) or one vector for each
input token (token embedding).2
        </p>
        <p>Different pre-trained BERT models across
languages can be downloaded. In this task, we have
used the bert-base-italian-xxl-cased model for the
Italian language3 to create token embeddings.</p>
        <p>The basic BERT version is transformer-based
and processes text in 12 different layers. In each
layer a contextualized token vector representation
can be created for each word in an input sentence.
It has been claimed that each layer captures
different aspects of the input. Jawahar et al. (2019)
suggest that the lower layers capture surface
features, the middle layers capture syntactic features
and the higher layers capture semantic features of
the text. Each layer can serve as representation
for the corresponding token by itself, or within a
combination of multiple layers.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Average Pairwise Distance</title>
        <p>
          Given two sets of token vectors from two time
periods t1 and t2, the idea of Average Pairwise Distance
(APD) is to randomly pick a number of vectors
from both sets and measure their pair-wise distance
          <xref ref-type="bibr" rid="ref1 ref11 ref13 ref14 ref15 ref17 ref5 ref5 ref7 ref8 ref8 ref9 ref9">(Sagi et al., 2009; Schlechtweg et al., 2018;
Giulianelli et al., 2020; Beck, 2020; Kutuzov and
Giulianelli, 2020b)</xref>
          . The LSC score of the word is the
mean average distance of all comparisons:
APD(V; W ) =
1
        </p>
        <p>
          X
nV
nW v2V;w2W
d(v; w)
where V and W are two sets of vectors, nV and
nW denote the number of vectors to be compared,
and d(v; w) refer to a distance measure (we used
cosine distance
          <xref ref-type="bibr" rid="ref16">(Salton and McGill, 1983)</xref>
          ).
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Tuning</title>
        <p>
          The choice of BERT layers and the measure used
to compare the resulting vectors (e.g. APD, COS
or clustering) strongly influence the performance
          <xref ref-type="bibr" rid="ref1 ref11 ref13 ref14 ref5 ref7 ref8 ref9">(Kutuzov and Giulianelli, 2020a)</xref>
          . Hence, we tuned
these parameters/modules on the English SemEval
data
          <xref ref-type="bibr" rid="ref19 ref7">(Schlechtweg et al., 2020)</xref>
          . For the 40 English
2The code of our system is available at https://
github.com/Garrafao/TokenChange.
        </p>
        <p>3https://huggingface.co/dbmdz/
bert-base-italian-xxl-cased
target words we had access to the sentences that
were used for the human annotation (in contrast
to task participants who had only access to the
lemmatized larger corpora containing more target
word uses than just the annotated ones).</p>
        <p>We tested several change measures regarding
their ability to find the actual changing words. As
part of our tuning, the APD measure produced the
binary and graded LSC scores that best matched
the actual LSC scores. We also tested the token
vectors from different layers in order to check which
one fits best to our task. The best layer
combinations were the average of the last four layers and
the average of the first and last layer of BERT. The
highest F1-score for the binary subtask was :75
and a Spearman correlation of :65 for the graded
subtask. Our results outperformed all official
submissions of the shared tasks, of which the best were
all type-based.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Threshold Selection</title>
        <p>
          We created four predicted change rankings for the
target words with BERT+APD. By experience and
consideration of the shared tasks
          <xref ref-type="bibr" rid="ref19 ref7">(Schlechtweg et
al., 2020)</xref>
          , we assumed that maximum half of all
target words are actual words with a change.
Therefore we always annotated at most 9 of 18 words
with 1. First, we extracted for each target word a
maximum of 200 sentences that contain the word
in any token form. We limited the number of uses
to 200 for computational efficiency reasons. Then,
for each occurrence, we extracted and averaged the
token vectors of (i) the last four layers of BERT,
and (ii) the first and last layer. For our first
submission (‘Last Four, 7’) we labeled those 7 words
with ‘1’ that achieved the highest APD scores in
layer combination (i). For our second submission
(‘First + Last, 7’) we labeled those 7 words with
‘1’ that achieved the highest APD scores in layer
combination (ii). In (i) and (ii) the same 9 words
had the highest APD scores. Therefore, in our third
submission (‘Average, 9’) exactly these 9 words
were labeled with ‘1’. And for our last submission
(Lemma, Average, 6’) we extracted only sentences
in which the target words were present in their
lemma form. Again we created the token vectors
for the two layer combinations of BERT mentioned
above. In both mentioned layer combinations the
same 6 words had the highest APD scores.
Therefore in our last submission exactly these 6 words
were labeled with ‘1’ (similar as in submission 1).
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        Table 1 shows the accuracy scores for the different
submissions. The best result was achieved by
combining the first and last layer of BERT (’First + Last,
7’ with :72), just like on the SemEval data. The
second-best result was obtained by using the
sentences where the target word occurred in its lemma
form (’Lemma, Average, 6’ with :67). Only these
two submissions outperformed the task baselines
and the majority class baseline. The two lowest
results were achieved by combining the last four
layers of BERT (’Last Four, 7’ with :61) and by
averaging the two layer combinations (’Average,
9’ with :61). The accuracy of our best submission
(:72) was ranked at position 5 of the shared task,
where the best task result was achieved by two
different submissions and reached an accuracy of :94.
Both submissions were based on type-based
embeddings
        <xref ref-type="bibr" rid="ref7">(Pražák et al., 2020a; Kaiser et al., 2020)</xref>
        ,
clearly outperforming our system.
      </p>
      <sec id="sec-5-1">
        <title>Submission</title>
        <p>First + Last
Lemma, Average</p>
      </sec>
      <sec id="sec-5-2">
        <title>Majority Class Baseline</title>
        <p>Average
Last Four</p>
      </sec>
      <sec id="sec-5-3">
        <title>Collocations Baseline</title>
      </sec>
      <sec id="sec-5-4">
        <title>Frequency Baseline Thresh.</title>
        <p>7
6
9
7
As aforementioned, the best performance of our
system, achieved with ’First + Last, 7’, has an
accuracy of :72. It erroneously predicts a meaning
change for cappuccio, unico and campionato, while
for palmare and rampante it does not detect the
change as given by the gold standard.</p>
        <p>We compared both corpora in order to find out if
the target words are correctly labeled by the gold
standard as well as to identify the possible reasons
behind the wrong predictions of our model.</p>
        <p>According to our analysis, we can state that the
data matches the gold standard. Cappuccio is
polysemous across both time periods t0 and t1 (“hood”,
“cap”). However, 31% of the uses in t1 are
uppercased, namely proper nouns (in contrast to the 4%
in t0), which might imply a different sense
compared to the above-mentioned ones:
(1) BENEVENTO Il desiderio di il potere , il
potere di il desiderio : ruota intorno a questo
inquietante ( e attualissimo ) spunto il Festival
di Benevento diretto da Ruggero Cappuccio .
‘BENEVENTO The desire of the power, the
power of the desire: the Festival di Benevento
directed by Ruggero Cappuccio revolves
around this unsettling (and current) cue.’
This skewed distribution of proper names in the
two corpora is a possible reason for the wrong
prediction of our model.</p>
        <p>Throughout all target words, we noticed that the
context provided by the previous and the following
sentences (as given as input to our model) is often
not related topic-wise; in some instances it seems
as if the sentences are headlines, since they refer to
different topics:
(2) M ROMA Sono quindici gli articoli in cui è
suddiviso il provvedimento « antiracket » [...].
Roberta Serra ha vinto ieri lo slalom gigante
di il campionati italiani femminili .
‘M ROMA The «antiracket» measure is
divided into fifteen articles [...]. Roberta
Serra won yesterday the giant slalom of the
Italian female championship.’
(3) ... le uniche azioni pericolose fiorentine sono
arrivate quando il pallone e statu giocato su i
lati di il Campo . costruzione di centrali
idroelettriche , di miniere , canali e strade ...
‘...the only dangerous Florentine actions
arrived when the ball was played on the sides
of the field. Construction of hydroelectric
power plants, mines, channels and streets...’
This “headlines effect” occurs across the whole
corpus. It can be traced back to the extraction
process of the original corpus and may be a main
source of error in our model. Despite not being
representative, the following example shows that
in some cases no centric window of any size would
avoid considering unrelated context.
(4) REPARTO CONFEZIONI UOMO GIACCA
cameriere bianca , in tessuto L’ unica cosa
certa è che il governo ha ricevuto una dura
lezione da i professori .
‘MEN’S TAILORING DEPARTMENT white
textile waiter JACKET The only certain thing
is that the government has received a hard
lesson by the professors.’</p>
        <p>Unico is another example of a word that was
erroneously predicted as changing. Due to its abstract
meaning (“only”, “single”, “unique”), it exhibits
heterogeneous context across both time periods.
Additionally, it can belong to different word classes
(noun and adjective in (5) and (6), respectively).
(5) Rischiamo di rimanere gli unici a non aver
dato mano a la ristrutturazione di le Forze
Armate .
‘We risk remaining the only ones not having
helped in the reorganization of the Armed
Forces.’
(6) ... è chiaro che l’ unica cosa da fare sarebbe l’
unificazione di le due aziende comunali ...
‘...it is clear that the only thing to do would be
the unification of the two municipal
companies...’
With regards to the undetected changes, the term
palmare (polysemous within and across word
classes) acquires a novel sense in t1. While it
mostly has the meaning of “evident” in the 22
sentences of t0 (see (7)), it additionally denotes
“palmtop” in t1 (see (8)).
(7) ... con evidenza palmare , la impossibilità di
difendere una causa perduta ...
‘with undeniable evidence, the impossibility
of defending a lost cause’
(8) Per i palestinesi occorre una sistemazione
provvisoria in attesa che gli europei si
accordino per accoglier li . Potremmo citare
in il lungo elenco il palmare Apple Newton
troppo in anticipo su i tempi
‘A temporary arrangement is needed for the
Palestinians while waiting for the Europeans
to agree on hosting them. We could quote in
the long list the palmtop Apple Newton too
far ahead of its time’
Note that also in (8), the topic of the previous and
the target sentence is unrelated.</p>
        <p>Rampante is a further case of undetected change.
The phrase cavallino rampante, which
metonymically denotes “Ferrari”, dominates the usage of the
word in t0 (70%) and covers a (slightly) relevant
share of the uses in t1 (19%). We hypothesize that
this leads to a large number of homogenous usage
pairs masking the change from “rampant”,
“unbridled” to “extremely ambitious” of rampante.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Our system comprising BERT+APD was ranked 5
in the DIACR-Ita shared task. The combination of
BERT and APD did not perform as well as expected
and much lower than the best type-based
embeddings, but our best submission still outperformed
all baselines. The high tuning results achieved on
the SemEval data could not be transferred to the
Italian data. One reason for this may be that a
different BERT model was applied, trained on text of
a different language. We have not tuned the Italian
BERT model. It is therefore possible that the
decrease in performance may be due to the change of
the underlying BERT model. Furthermore, given
that our model considers as input also the
previous and the following sentences, the presence of
semantically unrelated context could have played a
significant role in mislabeling the target words.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Dominik Schlechtweg was supported by the
Konrad Adenauer Foundation and the CRETA center
funded by the German Ministry for Education and
Research (BMBF) during the conduct of this study.
We thank the task organizers and reviewers for their
efforts.
of the 7th evaluation campaign of natural language
processing and speech tools for italian. In Valerio
Basile, Danilo Croce, Maria Di Maro, and Lucia C.
Passaro, editors, Proceedings of Seventh
Evaluation Campaign of Natural Language Processing and
Speech Tools for Italian. Final Workshop (EVALITA
2020), Online. CEUR.org.</p>
      <p>Christin Beck. 2020. DiaSense at SemEval-2020 Task
1: Modeling sense change via pre-trained BERT
embeddings. In Proceedings of the 14th
International Workshop on Semantic Evaluation, Barcelona,
Spain. Association for Computational Linguistics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ehsaneddin</given-names>
            <surname>Asgari</surname>
          </string-name>
          , Christoph Ringlstetter, and
          <string-name>
            <given-names>Hinrich</given-names>
            <surname>Schütze</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>EmbLexChange at SemEval-2020 Task 1: Unsupervised Embedding-based Detection of Lexical Semantic Changes</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, and
          <string-name>
            <given-names>Rossella</given-names>
            <surname>Varvara</surname>
          </string-name>
          .
          <year>2020a</year>
          .
          <article-title>DIACR-Ita @ EVALITA2020: Overview of the EVALITA 2020 Diachronic Lexical Semantics (DIACR-Ita) Task</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          . 2020b.
          <article-title>Evalita 2020: Overview Jacob Devlin</article-title>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Kawin</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>55</fpage>
          -
          <lpage>65</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Mario</given-names>
            <surname>Giulianelli</surname>
          </string-name>
          ,
          <source>Marco Del Tredici, and Raquel Fernández</source>
          .
          <year>2020</year>
          .
          <article-title>Analysing lexical semantic change with contextualised word representations</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>3960</fpage>
          -
          <lpage>3973</lpage>
          , Online, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Ganesh</given-names>
            <surname>Jawahar</surname>
          </string-name>
          , Benoît Sagot, and
          <string-name>
            <given-names>Djamé</given-names>
            <surname>Seddah</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</article-title>
          , pages
          <fpage>3651</fpage>
          -
          <lpage>3657</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jens</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <source>Dominik Schlechtweg, and Sabine Schulte im Walde</source>
          .
          <year>2020</year>
          .
          <article-title>OP-IMS @ DIACR-Ita: Back to the Roots: SGNS+OP+CD still rocks Semantic Change Detection</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mario</given-names>
            <surname>Giulianelli</surname>
          </string-name>
          .
          <year>2020a</year>
          .
          <article-title>UiOUvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mario</given-names>
            <surname>Giulianelli</surname>
          </string-name>
          .
          <year>2020b</year>
          .
          <article-title>UiOUvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          , Lilja Øvrelid, Terrence Szymanski, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Velldal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic word embeddings and semantic shifts: A survey</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1384</fpage>
          -
          <lpage>1397</lpage>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Matej</given-names>
            <surname>Martinc</surname>
          </string-name>
          , Syrielle Montariol, Elaine Zosa, and
          <string-name>
            <given-names>Lidia</given-names>
            <surname>Pivovarova</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Discovery Team at SemEval-2020 Task 1: Context-sensitive Embeddings not Always Better Than Static for Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Peters</surname>
          </string-name>
          , Mark Neumann, Luke Zettlemoyer, and
          <string-name>
            <surname>Wen-tau Yih</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Dissecting contextual word embeddings: Architecture and representation</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1499</fpage>
          -
          <lpage>1509</lpage>
          , Brussels, Belgium, October-November.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ondrˇej Pražák</surname>
          </string-name>
          , Pavel Prˇibáknˇ, and Stephen Taylor. 2020a.
          <article-title>UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Ondrˇej Pražák</surname>
          </string-name>
          , Pavel Prˇibáknˇ, Stephen Taylor, and Jakub Sido.
          <year>2020b</year>
          . UWB at SemEval
          <article-title>-2020 Task 1: Lexical Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Eyal</given-names>
            <surname>Sagi</surname>
          </string-name>
          , Stefan Kaufmann, and
          <string-name>
            <given-names>Brady</given-names>
            <surname>Clark</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Semantic density analysis: Comparing word meaning across time and phonetic space</article-title>
          .
          <source>In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics</source>
          , pages
          <fpage>104</fpage>
          -
          <lpage>111</lpage>
          , Athens, Greece, March. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Salton and Michael J McGill</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>Introduction to Modern Information Retrieval</article-title>
          .
          <string-name>
            <surname>McGraw-Hill Book</surname>
          </string-name>
          Company, New York.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Sabine Schulte im Walde, and
          <string-name>
            <given-names>Stefanie</given-names>
            <surname>Eckmann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic Usage Relatedness (DURel): A framework for the annotation of lexical semantic change</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          , New Orleans, Louisiana, USA.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Anna Hätty,
          <source>Marco del Tredici, and Sabine Schulte im Walde</source>
          .
          <year>2019</year>
          .
          <article-title>A Wind of Change: Detecting and evaluating lexical semantic change across times and domains</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>732</fpage>
          -
          <lpage>746</lpage>
          , Florence, Italy. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
            <given-names>McGillivray</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Simon</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , Haim Dubossarsky, and
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          .
          <year>2020</year>
          . SemEval
          <article-title>-2020 Task 1: Unsupervised Lexical Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation</source>
          , Barcelona, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Philippa</given-names>
            <surname>Shoemark</surname>
          </string-name>
          , Farhana Ferdousi Liza,
          <string-name>
            <given-names>Dong</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Scott</given-names>
            <surname>Hale</surname>
          </string-name>
          , and
          <string-name>
            <surname>Barbara McGillivray</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Room to Glo: A systematic comparison of semantic change detection approaches with word embeddings</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</source>
          , pages
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , Lars Borin, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Survey of computational approaches to diachronic conceptual change</article-title>
          . arXiv:
          <year>1811</year>
          .06278.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>