OP-IMS @ DIACR-Ita:
Back to the Roots: SGNS+OP+CD still Rocks Semantic Change Detection

                   Jens Kaiser, Dominik Schlechtweg, Sabine Schulte im Walde
                  Institute for Natural Language Processing, University of Stuttgart
                        {jens.kaiser,schlecdk,schulte}@ims.uni-stuttgart.de


                       Abstract                              setting win the shared task with near to perfect ac-
                                                             curacy (.94). Our results once more demonstrate
    We present the results of our participa-
                                                             that, within the present task setup in lexical seman-
    tion in the DIACR-Ita shared task on lex-
                                                             tic change detection, the traditional type-based ap-
    ical semantic change detection for Ital-
                                                             proaches yield excellent performance.
    ian. We exploit one of the earliest and
    most influential semantic change detection               2   Related Work
    models based on Skip-Gram with Negative
    Sampling, Orthogonal Procrustes align-                   As evident in Schlechtweg et al. (2020) the field
    ment and Cosine Distance and obtain the                  of LSCD is currently dominated by Vector Space
    winning submission of the shared task                    Models (VSMs), which can be divided into type-
    with near to perfect accuracy (.94). Our                 based (Turney and Pantel, 2010) and token-based
    results once more indicate that, within                  (Schütze, 1998) models. Prominent type-based
    the present task setup in lexical seman-                 models include low-dimensional embeddings such
    tic change detection, the traditional type-              as the Global Vectors (Pennington et al., 2014,
    based approaches yield excellent perfor-                 GloVe) the Continuous Bag-of-Words (CBOW),
    mance.                                                   the Continuous Skip-gram as well as a slight mod-
                                                             ification of the latter, the Skip-gram with Negative
1   Introduction                                             Sampling model (Mikolov et al., 2013a; Mikolov
Lexical Semantic Change (LSC) Detection has                  et al., 2013b, SGNS). However, as these mod-
drawn increasing attention in recent years (Kutu-            els come with the deficiency that they aggregate
zov et al., 2018; Tahmasebi et al., 2018). Re-               all senses of a word into a single representation,
cently, SemEval-2020 Task 1 provided a multi-                token-based embeddings have been proposed (Pe-
lingual evaluation framework to compare the vari-            ters et al., 2018; Devlin et al., 2019). According
ety of proposed model architectures (Schlechtweg             to Hu et al. (2019) these models can ideally cap-
et al., 2020). The DIACR-Ita shared task extends             ture complex characteristics of word use, and how
parts of this framework to Italian by providing              they vary across linguistic contexts. The results of
an Italian data set for SemEval’s binary subtask             SemEval-2020 Task 1 (Schlechtweg et al., 2020),
(Basile et al., 2020a; Basile et al., 2020b).                however, show that contrary to this, the token-
   We present the results of our participation in            based embedding models (Beck, 2020; Kutuzov
the DIACR-Ita shared task exploiting one of the              and Giulianelli, 2020) are heavily outperformed
earliest and most established semantic change de-            by the type-based ones (Pražák et al., 2020; As-
tection models based on Skip-Gram with Nega-                 gari et al., 2020). The SGNS model was not
tive Sampling, Orthogonal Procrustes alignment               only widely used, but also performed best among
and Cosine Distance (Hamilton et al., 2016a).                the participants in the task. Its fast implementa-
Based on our previous research (Schlechtweg et               tion and combination possibilities with different
al., 2019; Kaiser et al., 2020) we optimize the              alignment types further solidify SGNS as the stan-
dimensionality parameter assuming that high di-              dard in LSCD. A common and surprisingly ro-
mensionalities reduce alignment error. With our              bust (Schlechtweg et al., 2019; Kaiser et al., 2020)
                                                             practice is to align the time-specific SGNS embed-
      “Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0     dings with Orthogonal Procrustes (OP) and mea-
International (CC BY 4.0).”                                  sure change with Cosine Distance (CD) (Kulka-
rni et al., 2015; Hamilton et al., 2016b). This has                     P (c) = #(c)
                                                                                   |D| for each observation of (w, c), cf.
been shown in several small but independent ex-                         Levy et al. (2015). After training, each word w is
periments (Hamilton et al., 2016b; Schlechtweg                          represented by its word vector vw .
et al., 2019; Kaiser et al., 2020; Shoemark et al.,                        Previous research on the influence of parame-
2019) and SGNS+OP+CD has produced two of                                ter settings on SGNS+OP+CD lays the founda-
three top-performing submissions in Subtask 2 in                        tion for our parameter choices (Schlechtweg et al.,
SemEval-2020 Task 1 including the winning sub-                          2019; Kaiser et al., 2020). Although this sub-
mission (Pömsl and Lyapin, 2020; Arefyev and                           system combination is extremely stable regardless
Zhikov, 2020).                                                          of parameter settings, subtle improvements can be
                                                                        achieved by modifying the window size and di-
3     System overview                                                   mensionality. A common hurdle in LSC detection
                                                                        is the small corpus size, increasing the standard
Most VSMs in LSC detection combine three sub-
                                                                        setting for window size from 5 to 10 leads to the
systems: (i) creating semantic word representa-
                                                                        creation of more word-context pairs used for train-
tions, (ii) aligning them across corpora, and (iii)
                                                                        ing the model. In addition, we also experiment
measuring differences between the aligned rep-
                                                                        with dimensionalities of 300 and 500. Higher di-
resentations (Schlechtweg et al., 2019). Align-
                                                                        mensionalities alleviate the introduction of noise
ment is needed as columns from different vector
                                                                        during the alignment process (Kaiser et al., 2020).
spaces may not correspond to the same coordinate
                                                                        We keep the rest of the parameter settings at their
axes, due to the stochastic nature of many low-
                                                                        default values (learning rate α=0.025, #negative
dimensional word representations (Hamilton et al.,
                                                                        samples k=5 and sub-sampling t=0.001).
2016b). Following the above-described success,
we use SGNS to create word representations in                           3.2   Alignment
combination with Orthogonal Procrustes (OP) for
                                                                        SGNS is trained on each corpus separately, re-
vector space alignment and Cosine Distance (CD)
                                                                        sulting in matrices A and B. To align them we
(Salton and McGill, 1983) to measure differences
                                                                        follow Hamilton et al. (2016b) and calculate an
between word vectors. From the resulting graded
                                                                        orthogonally-constrained matrix W ∗ :
change predictions we infer binary change values
by comparing the target word distribution to the                                  W ∗ = arg min kBW − AkF
full distribution of change predictions between the                                       W ∈O(d)
target corpora. For our experiments we use the
                                                                        where the i-th row in matrices A and B correspond
code provided by Schlechtweg et al. (2019).1
                                                                        to the same word. Using W ∗ we get the aligned
3.1    Semantic Representation                                          matrices AOP = A and B OP = BW ∗ . Prior
                                                                        to this alignment step we length-normalize and
SGNS is a shallow neural network trained on pairs                       mean-center both matrices (Artetxe et al., 2017;
of word co-occurrences extracted from a corpus                          Schlechtweg et al., 2019).
with a symmetric window. It represents each word
w and each context c as a d-dimensional vector to                       3.3   Threshold
solve                                                                   The DIACR-Ita shared task requires a binary la-
                                                                        bel for each of the target words. However,
           X                              X
arg max             log σ(vc · vw ) +               log σ(−vc · vw ),   CD produces graded values between 0.0 and 2.0
      θ
          (w,c)∈D                       (w,c)∈D 0                       when measuring differences in word vectors be-
                                                                        tween the two time periods. We tackle this prob-
where σ(x) = 1+e1−x , D is the set of all ob-                           lem by defining a threshold parameter, similar to
served word-context pairs and D0 is the set of ran-                     many approaches applied in SemEval-2020 Task 1
domly generated negative samples (Mikolov et al.,                       (Schlechtweg et al., 2020). All words with a CD
2013a; Mikolov et al., 2013b; Goldberg and Levy,                        greater or equal than the threshold are labeled ‘1’,
2014). The optimized parameters θ are vwi and                           indicating change. Words with a CD less than the
vci for i ∈ 1, ..., d. D0 is obtained by drawing k                      threshold are assigned ‘0’, indicating no change.
contexts from the empirical unigram distribution                           A simplified approach is to set the threshold
  1
    https://github.com/Garrafao/                                        such that the number of words is equal in both
LSCDetection                                                            groups. This has many disadvantages: Mainly, it
relies on the assumption that the two groups are of                  entry     dim        threshold   ACC      AP
equal size. This is rarely given in real world ap-                     #2      300       (µ+σ) .76     .944   .915
plications, especially if the focus is in one word                     #4      500       (µ+σ) .78     .889   .915
at a time. Thus a more sophisticated approach is                       #1      300      (50:50) .57    .833   .915
needed. In SemEval-2020’s Subtask 1 many par-                          #3      500      (50:50) .64    .833   .915
ticipants faced the same problem and developed                       major. baseline        -          .667   .333
various methods to solve it. Similar to the sim-                      freq. baseline      unk.         .611   .418
plified approach, Zhou and Li (2020) only look                       colloc. baseline     unk.         .500   unk.
at target words, and after fitting the histogram of
CDs to a gamma distribution, set the threshold at                Table 1: Accuracy (ACC) and Average Precision
the 75% density quantile. This approach resulted                 (AP) for various parameter settings and thresholds
in good performance but is not always applicable                 and baselines; freq. baseline: Absolute frequency
due to its dependence on underlying properties of                difference between the words in C1 and C2 and
the test set. Amar and Liebeskind (2020) avoid                   an unknown threshold; colloc. baseline: Bag of
the dependence on target words by randomly se-                   Words + CD and an unknown threshold; major.
lecting 200 words and setting the threshold such                 baseline: Every word labeled with ‘0’.
that 90% of the 200 words have a lower distance
than the threshold. A more careful selection of                  phase each team was allowed to submit 4 predic-
words is taken by Martinc et al. (2020), they look               tions for the full list of target words, which were
at the CD of semantically stable stop words, accu-               scored using classification accuracy between the
mulate them in different bins and set the threshold              predicted labels and the gold data. The final com-
to the upper limit of the bin containing fewer than              petition ranking compares only the highest of the
#stopwords/#bins words. Pražák et al. (2020)                   4 scores achieved by each team.
propose several methods. One of them is setting
the threshold at the mean of the distances of all                5     Results
words in the corpus vocabulary. Our method for
determining a threshold is very similar to Pražák              We created target word rankings using
et al. (2020), but instead of taking the mean, we                SGNS+OP+CD with a dimensionality of 300
use the mean + one standard deviation (µ + σ) of                 and 500 as described above. From these rankings
all words in the corpus vocabulary.                              our predictions are calculated using two different
                                                                 thresholding methods: (i) Splitting the targets
4    Experimental setup                                          into two equally-sized groups (50:50) and (ii)
                                                                 using the mean + one standard deviation (µ+σ)
The DIACR-Ita task definition is taken from                      as threshold, refer to Section 3.3. The accuracy
SemEval-2020 Task 1 Subtask 1 (binary change                     scores achieved in this way are listed in Table 1,
detection): Given a list of target words and a                   alongside the official baselines freq. and colloc.
diacronic corpus pair C1 and C2 , the task is to                 and an additional major. baseline. Submission
identify the respective target words which have                  #2 is our highest scoring submission and won
changed their meaning between the time periods                   the DIACR-Ita task together with one other
t1 and t2 (Basile et al., 2020a; Schlechtweg et al.,             undisclosed submission. For both of our rankings
2020).2 C1 and C2 have been extracted from Ital-                 the 50:50 threshold yielded lower accuracy than
ian newspapers and books. Target words which                     the µ+σ threshold. This is due to the imbalance
have changed their meaning are labeled with the                  of changed to unchanged target words in the
value ‘1’, the remaining target words are labeled                test set. Using µ+σ as threshold resulted in an
with ‘0’. Gold data for the 18 target words is semi-             optimal split for the ranking created with d=300.
automatically generated from Italian online dictio-              For d=500 this threshold was slightly too high
naries. According to the gold data, 6 of the 18 tar-             with a value of 0.78. The target word palmare
get words are subject to semantic change between                 which, according to the gold data, has undergone
t1 and t2 . This gold data was only made public                  semantic change (label ‘1’) has CD of 0.76 and
after the evaluation phase. During the evaluation                was thus incorrectly labeled by our system. Figure
   2
     The time periods t1 and t2 were not disclosed to partici-   1 shows the histogram of CD values for all words
pants.                                                           of the corpus dictionary in gray. The green and
                      (a) d=300                                             (b) d=500

Figure 1: Background shows histogram (in gray) of CDs for all words in the corpus vocabulary. The
colored bars show the CDs of target words, green indicates that the target word was correctly labeled,
red indicates incorrect labeling. Vertical line marks threshold value (mean + standard deviation).


red colored bars correspond target words. If               The one target word which our model labels in-
the target word was correctly labeled the bar is        correctly, across a variety of parameter settings, is
green, incorrect labeled target words have red          piovra. According to the gold data this word has
bars. From this visualisation we can see that there     not undergone semantic change between t1 and t2 ,
is a pronounced gap between the CDs of target           while our system labels it as changed. A possi-
words which have changed and those which have           ble explanation for the error may be differences
not. Our proposed threshold method of µ+σ tends         in frequency: In C1 piovra appears 35 times and
to slightly overshoot this gap. This has lead to        in C2 it appears 643 times. SGNS often struggles
the lower accuracy of submission #4, despite the        to create reliable embeddings for low frequency
ranking allowing for a higher accuracy. In order to     words (Kaiser et al., 2020). Alternatively, the er-
measure the quality of the rankings independent         ror could be caused by discrepancies between gold
from the threshold we also report AP (Shwartz           labels and corpora. Basile et al. (2020a) state that
et al., 2017) in Table 1, confirming the potential      the gold data is initially based on Italian online
equal performance.                                      dictionaries such as ‘Sabatini Coletti’. In a man-
   The method of using the mean + one standard          ual annotation process the gold data is further re-
deviation of the CDs of all words in the corpus dic-    fined by providing human judges with up to 100
tionary resulted in good accuracy, but leaves room      occurrences of each target word, for which they
for improvement. It tends to over-shoot the gap         have to identify the used meaning according to
between unchanged and changed words slightly.           the meanings listed in the dictionaries. A target
Only using the mean shifts the tendency towards         word is labeled as changed if a meaning is ob-
under-shooting the gap. The optimal threshold           served in C2 which has not been observed in C1 .
seems to lie somewhere in between. Though, this         Although not very likely, it is possible that this
needs the be confirmed on other, larger, data sets.     annotation method fails to detect novel senses in
Furthermore, not all binary classification tasks are    C2 . Sabatini Coletti reports that in addition to the
suitable for the approach of first creating a ranked    sense “squid” piovra acquired a new sense “a se-
list of graded change predictions and then choos-       cret criminal organisation deeply rooted in soci-
ing a threshold. The data set of SemEval-2020           ety” in 1983. This might explain why we detect
Task 1 comprises two tasks, a binary and a ranked       piovra as a word which has undergone semantic
task for the same target words. It is not possible to   change given that C1 comprises texts from 1948
achieve an accuracy of 1 on the binary task even if     to 1970 and C2 comprises texts from 1990 to 2014
all the ranks are predicted correctly for the graded    (Basile et al., 2020a).
task, i.e., binary change is not just high graded          The DIACR-Ita task dataset is a very valuable
change (Schlechtweg et al., 2020).                      contribution to the research field of LSC detec-
tion and extends the variety of available data sets         Barcelona, Spain. Association for Computational
to the Italian language. Nonetheless, two points            Linguistics.
are important when interpreting or results this data      Mikel Artetxe, Gorka Labaka, and Eneko Agirre.
set: (i) it contains a small number of target words         2017. Learning bilingual word embeddings with
in combination with binary classification. This             (almost) no bilingual data. In Proceedings of the
makes the data set vulnerable to randomness. (ii)           55th Annual Meeting of the Association for Compu-
                                                            tational Linguistics, pages 451–462. Association for
The nature of the gold labels, in addition to possi-        Computational Linguistics.
bly not being directly related to the corpus, it is un-
clear if they reflect semantic change as sense gain       Ehsaneddin Asgari, Christoph Ringlstetter, and Hinrich
and sense loss as in SemEval’s Subtask 1. The on-           Schütze. 2020. EmbLexChange at SemEval-2020
                                                            Task 1: Unsupervised Embedding-based Detection
line dictionaries which create the basis for the gold       of Lexical Semantic Changes. In Proceedings of
data only state sense gains. Thus, it might possible        the 14th International Workshop on Semantic Eval-
for a word to completely lose a sense but still be          uation, Barcelona, Spain. Association for Computa-
labeled as unchanged.                                       tional Linguistics.

                                                          Pierpaolo Basile, Annalina Caputo, Tommaso Caselli,
6   Conclusion                                               Pierluigi Cassotti, and Rossella Varvara. 2020a.
                                                             DIACR-Ita @ EVALITA2020:             Overview of
We participated in the DIACR-Ita shared task us-             the EVALITA2020 Diachronic Lexical Semantics
ing well-established type-based methods for di-              (DIACR-Ita) Task. In Valerio Basile, Danilo Croce,
acronic semantic representations in combination              Maria Di Maro, and Lucia C. Passaro, editors, Pro-
                                                             ceedings of the 7th evaluation campaign of Natural
with a carefully calculated threshold. We were               Language Processing and Speech tools for Italian
able to reach the first place with a nearly perfect          (EVALITA 2020), Online. CEUR.org.
accuracy of .94 confirming once more the reli-
ability of the type-based embeddings created by           Valerio Basile, Danilo Croce, Maria Di Maro, and Lu-
                                                            cia C. Passaro. 2020b. Evalita 2020: Overview
SGNS, OP as an alignment method and CD to                   of the 7th evaluation campaign of natural language
measure differences between word vectors. The               processing and speech tools for italian. In Valerio
presented approach is very suitable for similar             Basile, Danilo Croce, Maria Di Maro, and Lucia C.
tasks as no fine-tuning of parameters is needed.            Passaro, editors, Proceedings of Seventh Evalua-
                                                            tion Campaign of Natural Language Processing and
Yet, the system relies on the assumption that               Speech Tools for Italian. Final Workshop (EVALITA
graded change is indicative of binary classes.              2020), Online. CEUR.org.

Acknowledgments                                           Christin Beck. 2020. DiaSense at SemEval-2020
                                                            Task 1: Modeling sense change via pre-trained
Dominik Schlechtweg was supported by the Kon-               BERT embeddings. In Proceedings of the 14th
                                                            International Workshop on Semantic Evaluation,
rad Adenauer Foundation and the CRETA cen-
                                                            Barcelona, Spain. Association for Computational
ter funded by the German Ministry for Education             Linguistics.
and Research (BMBF) during the conduct of this
study. We thank the task organizers and reviewers         Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
                                                             Kristina Toutanova. 2019. BERT: Pre-training of
for their efforts.                                           deep bidirectional transformers for language under-
                                                             standing. In Proceedings of the 2019 Conference of
                                                             the North American Chapter of the Association for
References                                                   Computational Linguistics: Human Language Tech-
                                                             nologies, Volume 1 (Long and Short Papers), pages
Efrat Amar and Chaya Liebeskind. 2020. JCT at                4171–4186, Minneapolis, Minnesota, June. Associ-
  SemEval-2020 Task 1: Combined Semantic Vec-                ation for Computational Linguistics.
  tor Spaces Models for Unsupervised Lexical Se-
  mantic Change Detection. In Proceedings of the          Yoav Goldberg and Omer Levy.             2014.
  14th International Workshop on Semantic Evalua-           Word2vec explained: Deriving Mikolov et al.’s
  tion, Barcelona, Spain. Association for Computa-          negative-sampling  word-embedding   method.
  tional Linguistics.                                       arXiv:1402.3722.

Nikolay Arefyev and Vasily Zhikov. 2020. BOS              William L. Hamilton, Jure Leskovec, and Dan Jurafsky.
  at SemEval-2020 Task 1: Word Sense Induc-                 2016a. Cultural shift or linguistic drift? Comparing
  tion via Lexical Substitution for Lexical Seman-          two computational measures of semantic change.
  tic Change Detection. In Proceedings of the 14th          In Proceedings of the 2016 Conference on Empiri-
  International Workshop on Semantic Evaluation,            cal Methods in Natural Language Processing, pages
  2116–2121, Austin, Texas. Association for Compu-          on Learning Representations, ICLR 2013, Scotts-
  tational Linguistics.                                     dale, Arizona, USA, May 2-4, 2013, Workshop Track
                                                            Proceedings.
William L. Hamilton, Jure Leskovec, and Dan Jurafsky.
  2016b. Diachronic word embeddings reveal statisti-      Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor-
  cal laws of semantic change. In Proceedings of the        rado, and Jeffrey Dean. 2013b. Distributed repre-
  54th Annual Meeting of the Association for Compu-         sentations of words and phrases and their composi-
  tational Linguistics, pages 1489–1501, Berlin, Ger-       tionality. In Advances in Neural Information Pro-
  many. Association for Computational Linguistics.          cessing Systems 26, pages 3111–3119, Lake Tahoe,
                                                            Nevada, USA.
Renfen Hu, Shen Li, and Shichen Liang. 2019. Di-
  achronic sense modeling with deep contextualized        Jeffrey Pennington, Richard Socher, and Christopher
  word embeddings: An ecological view. In Proceed-           Manning. 2014. Glove: Global vectors for word
  ings of the 57th Annual Meeting of the Association         representation. In Proceedings of the 2014 Con-
  for Computational Linguistics, pages 3899–3908,            ference on Empirical Methods in Natural Language
  Florence, Italy. Association for Computational Lin-        Processing, pages 1532–1543, Doha, Qatar.
  guistics.
                                                          Matthew Peters, Mark Neumann, Mohit Iyyer, Matt
Jens Kaiser, Dominik Schlechtweg, Sean Papay, and          Gardner, Christopher Clark, Kenton Lee, and Luke
   Sabine Schulte im Walde. 2020. IMS at SemEval-          Zettlemoyer. 2018. Deep contextualized word rep-
   2020 Task 1: How low can you go? Dimensionality         resentations. In Proceedings of the 2018 Conference
   in Lexical Semantic Change Detection. In Proceed-       of the North American Chapter of the Association
   ings of the 14th International Workshop on Semantic     for Computational Linguistics: Human Language
   Evaluation, Barcelona, Spain. Association for Com-      Technologies, pages 2227–2237, New Orleans, LA,
   putational Linguistics.                                 USA.
Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and          Martin Pömsl and Roman Lyapin. 2020. CIRCE at
  Steven Skiena. 2015. Statistically significant de-       SemEval-2020 Task 1: Ensembling Context-Free
  tection of linguistic change. In Proceedings of the      and Context-Dependent Word Representations. In
  24th International Conference on World Wide Web,         Proceedings of the 14th International Workshop on
  WWW, pages 625–635, Florence, Italy.                     Semantic Evaluation, Barcelona, Spain. Association
Andrey Kutuzov and Mario Giulianelli. 2020. UiO-           for Computational Linguistics.
  UvA at SemEval-2020 Task 1: Contextualised Em-
  beddings for Lexical Semantic Change Detection.         Ondřej Pražák, Pavel Přibákň, Stephen Taylor, and
  In Proceedings of the 14th International Workshop         Jakub Sido. 2020. UWB at SemEval-2020 Task
  on Semantic Evaluation, Barcelona, Spain. Associa-        1: Lexical Semantic Change Detection. In Proceed-
  tion for Computational Linguistics.                       ings of the 14th International Workshop on Semantic
                                                            Evaluation, Barcelona, Spain. Association for Com-
Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski,          putational Linguistics.
  and Erik Velldal. 2018. Diachronic word embed-
  dings and semantic shifts: a survey. In Proceedings     Gerard Salton and Michael J McGill. 1983. Introduc-
  of the 27th International Conference on Computa-          tion to Modern Information Retrieval. McGraw-Hill
  tional Linguistics, pages 1384–1397, Santa Fe, New        Book Company, New York.
  Mexico, USA. Association for Computational Lin-
  guistics.                                               Dominik Schlechtweg, Anna Hätty, Marco del Tredici,
                                                            and Sabine Schulte im Walde. 2019. A Wind of
Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Im-          Change: Detecting and evaluating lexical seman-
 proving distributional similarity with lessons learned     tic change across times and domains. In Proceed-
 from word embeddings. Transactions of the Associ-          ings of the 57th Annual Meeting of the Association
 ation for Computational Linguistics, 3:211–225.            for Computational Linguistics, pages 732–746, Flo-
                                                            rence, Italy. Association for Computational Linguis-
Matej Martinc, Syrielle Montariol, Elaine Zosa, and         tics.
 Lidia Pivovarova. 2020. Discovery Team at
 SemEval-2020 Task 1: Context-sensitive Embed-            Dominik Schlechtweg, Barbara McGillivray, Simon
 dings not Always Better Than Static for Seman-             Hengchen, Haim Dubossarsky, and Nina Tahmasebi.
 tic Change Detection. In Proceedings of the 14th           2020. SemEval-2020 Task 1: Unsupervised Lexi-
 International Workshop on Semantic Evaluation,             cal Semantic Change Detection. In Proceedings of
 Barcelona, Spain. Association for Computational            the 14th International Workshop on Semantic Eval-
 Linguistics.                                               uation, Barcelona, Spain. Association for Computa-
                                                            tional Linguistics.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
  Dean. 2013a. Efficient estimation of word repre-        Hinrich Schütze. 1998. Automatic word sense dis-
  sentations in vector space. In Yoshua Bengio and          crimination. Computational Linguistics, 24(1):97–
  Yann LeCun, editors, 1st International Conference         123, March.
Philippa Shoemark, Farhana Ferdousi Liza, Dong
  Nguyen, Scott Hale, and Barbara McGillivray.
  2019. Room to Glo: A systematic comparison of se-
  mantic change detection approaches with word em-
  beddings. In Proceedings of the 2019 Conference on
  Empirical Methods in Natural Language Processing
  and the 9th International Joint Conference on Natu-
  ral Language Processing, pages 66–76, Hong Kong,
  China. Association for Computational Linguistics.
Vered Shwartz, Enrico Santus, and Dominik
  Schlechtweg. 2017. Hypernyms under siege:
  Linguistically-motivated artillery for hypernymy
  detection. In Proceedings of the 15th Conference
  of the European Chapter of the Association for
  Computational Linguistics, Valencia, Spain, pages
  65–75.

Nina Tahmasebi, Lars Borin, and Adam Jatowt. 2018.
  Survey of computational approaches to diachronic
  conceptual change. CoRR, abs/1811.06278.

Peter D. Turney and Patrick Pantel. 2010. From fre-
  quency to meaning: Vector space models of seman-
  tics. J. Artif. Int. Res., 37(1):141–188, January.

Jinan Zhou and Jiaxin Li. 2020. TemporalTeller at
   SemEval-2020 Task 1: Unsupervised Lexical Se-
   mantic Change Detection with Temporal Referenc-
   ing. In Proceedings of the 14th International Work-
   shop on Semantic Evaluation, Barcelona, Spain. As-
   sociation for Computational Linguistics.