=Paper= {{Paper |id=Vol-2765/110 |storemode=property |title=UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation |pdfUrl=https://ceur-ws.org/Vol-2765/paper110.pdf |volume=Vol-2765 |authors=Ondřej Pražák,Pavel Přibáň,Stephen Taylor |dblpUrl=https://dblp.org/rec/conf/evalita/PrazakP020 }} ==UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation== https://ceur-ws.org/Vol-2765/paper110.pdf
    UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and
                     Orthogonal Transformation
                       Ondřej Pražák* 1,2 , Pavel Přibáň* 1,2 , and Stephen Taylor* 2
                            1 NTIS – New Technologies for the Information Society,
                              2 Department of Computer Science and Engineering,

                 Faculty of Applied Sciences, University of West Bohemia, Czech Republic
                            {ondfa, pribanp, taylor}@kiv.zcu.cz
                                      http://nlp.kiv.zcu.cz

                       Abstract                               achronic Lexical Semantic Change (LSC) detec-
                                                              tion. The task is getting more attention in re-
     In this paper, we describe our method for                cent years (Hamilton et al., 2016b; Schlechtweg
     detection of lexical semantic change (i.e.,              et al., 2017; Schlechtweg et al., 2020). There is
     word sense changes over time) for the                    also the synchronic LSC task, which aims to iden-
     DIACR-Ita shared task, where we ranked                   tify domain-specific changes of word senses com-
     1st . We examine semantic differences be-                pared to general-language usage (Schlechtweg et
     tween specific words in two Italian cor-                 al., 2019).
     pora, chosen from different time periods.
     Our method is fully unsupervised and lan-                1.1   Related Work
     guage independent. It consists of prepar-
                                                              Tahmasebi et al. (2018) provide a comprehensive
     ing a semantic vector space for each cor-
                                                              survey of techniques for the LSC task, as do Kutu-
     pus, earlier and later. Then we compute a
                                                              zov et al. (2018). Schlechtweg et al. (2019) eval-
     linear transformation between earlier and
                                                              uate available approaches for LSC detection us-
     later spaces, using CCA and Orthogonal
                                                              ing the DURel dataset (Schlechtweg et al., 2018).
     Transformation. Finally, we measure the
                                                              Schlechtweg et al. (2020) present results of the
     cosines between the transformed vectors.
                                                              first shared task that addresses the LSC problem
1    Introduction                                             and provide an evaluation dataset that was manu-
Language evolves with time. New words appear,                 ally annotated for four languages.
old words fall out of use, and the meanings of                   According to Schlechtweg et al. (2019), there
some words shift. There are changes in topics,                are three main types of approaches. (1) Semantic
syntax, and presentation structure. Reading the               vector spaces approaches (Gulordava and Baroni,
natural philosophy musings of aristocratic ama-               2011; Eger and Mehler, 2016; Hamilton et al.,
teurs from the eighteenth century, and comparing              2016a; Hamilton et al., 2016b; Rosenfeld and Erk,
with a monograph from the nineteenth century, or              2018; Pražák et al., 2020) represent each word
a medical study from the twentieth century, we can            with two vectors for two different time periods.
observe differences in many dimensions, some of               The change of meaning is then measured by some
which need a deep historical background to study.             distance (usually by the cosine distance) between
Changes in word senses are both a visible and a               the two vectors. (2) Topic modeling approaches
tractable part of language evolution.                         (Bamman and Crane, 2011; Mihalcea and Nastase,
   Computational methods for researching the sto-             2012; Cook et al., 2014; Frermann and Lapata,
ries of words have the potential of helping us                2016; Schlechtweg and Walde, 2020) estimate a
understand this small corner of linguistic evolu-             probability distribution of words over their differ-
tion. The tools for measuring these diachronic                ent senses, i.e., topics and (3) Clustering models
semantic shifts might also be useful for measur-              (Mitra et al., 2015; Tahmasebi and Risse, 2017).
ing whether the same word is used in different
                                                              1.2   The DIACR-Ita task
ways in synchronic documents. The task of find-
ing word sense changes over time is called di-                The goal of the DIACR-Ita task (Basile et al.,
*
                                                              2020a; Basile et al., 2020b) is to establish if a set
 Equal contribution. Copyright © 2020 for this paper by its
authors. Use permitted under Creative Commons License At-     of Italian words (target words) change their mean-
tribution 4.0 International (CC BY 4.0).                      ing from time period t1 to time period t2 (i.e., bi-
nary classification task). The organizers provide      use methods for cross-lingual mapping (Brychcı́n
corresponding corpora C1 and C2 and a list of          et al., 2019; Artetxe et al., 2016; Artetxe et al.,
target words. Only these inputs may be used to         2017; Artetxe et al., 2018a; Artetxe et al., 2018b)
train systems, which judge for each target word,       and thanks to the large similarity between L1 and
whether it is changed or not. The task is the same     L2 the quality of transformation should be high.
as the binary sub-task of the SemEval-2020 Task        We compute cosine similarity of the transformed
1 (Schlechtweg et al., 2020) competition.              word vectors to classify whether the target words
                                                       changed their sense.
2     Data
                                                       3.2   Semantic Space Transformation
The DIACR-Ita data consists of many randomly
ordered text samples that have no relationship to      First, we train two semantic spaces from corpus
each other. Most of the text samples are complete      C1 and C2 . We represent the semantic spaces by a
sentences, but some are sentence fragments.            matrix Xs (i.e., a source space s) and a matrix Xt
   The ‘early’ corpus, C1 has about 2.4 million text   (i.e., a target space t)2 using word2vec Skip-gram
samples and 52 million tokens; the ‘later’ corpus,     with negative sampling (Mikolov et al., 2013). We
C2 has about 7.8 million text samples and 738 mil-     perform a cross-lingual mapping of the two vec-
lion tokens. Each token is given in the corpora        tor spaces, getting two matrices X̂s and X̂t pro-
with its part-of-speech tag and lemma. The tar-        jected into a shared space. We select two meth-
get word list consists of 18 lemmas. The POS and       ods for the cross-lingual mapping Canonical Cor-
lemmas of the corpora are generated with the UD-       relation Analysis (CCA) using the implementation
Pipe (Straka, 2018) model ISDT-UD v2.5, which          from (Brychcı́n et al., 2019) and a modification
has an error rate of about 2%.                         of the Orthogonal Transformation from VecMap
                                                       (Artetxe et al., 2018b). Both of these methods are
3     System Description                               linear transformations. The transformations can
                                                       be written as follows:
3.1    Overview
Because language is evolving, expressions, words,                         X̂s = Ws→t Xs                       (1)
and sentence constructions in two corpora from
different time periods about the same topic will       where Ws→t is a matrix that performs linear
be written in languages that are quite similar but     transformation from the source space s (matrix
slightly different. They will share the major-         Xs ) into a target space t and X̂s is the source space
ity of their words, grammar, and syntax. We            transformed into the target space t (the matrix Xt
can observe a similar situation in languages from      does not have to be transformed because Xt is al-
the same family, such as Italian-Spanish in Ro-        ready in the target space t and Xt = X̂t ).
mance languages or Czech-Slovak in Slavic lan-            Finally, in all transformation methods, for each
guages. These pairs of languages share a lot of        word wi from the set of target words T , we se-
                                                       lect its corresponding vectors vw   s and vt from
common words, expressions and syntax. For some                                               i       wi
pairs, native speakers can understand and some-        matrices X̂s and X̂t , respectively (vw  s ∈ X̂s and
                                                                                                  i
times even actively communicate through a (low)        vwt ∈ X̂t ), and we compute cosine similarity be-
                                                          i
language barrier.                                      tween these two vectors. The cosine similarity is
   Our system follows the approach from (Pražák      then used to generate a final classification output
et al., 2020)1 . The main idea behind our solution     using different strategies, see Section 3.5 and 3.6.
is that we treat each pair of corpora C1 and C2
as different languages L1 and L2 even though the       3.3   Canonical Correlation Analysis
text from both corpora is written in Italian. We       Generally, the CCA transformation transforms
believe that these two languages L1 and L2 will        both spaces Xs and Xt into a third shared space
be extremely similar in all aspects, including se-     o (where Xs 6= X̂s and Xt 6= X̂t ). Thus, CCA
mantic. We train a separate semantic space for         computes two transformation matrices Ws→o for
each corpus, and subsequently, we map these two        the source space and Wt→o for the target space.
spaces into one common cross-lingual space. We         The transformation matrices are computed by
  1
    The source code is available at https://github.        2
                                                             The source space Xs is created from the corpus C1 and
com/pauli31/SemEval2020-task1                          the target space Xt is created from the corpus C2 .
minimizing the negative correlation between the            3.5    Binary Strategy
vectors xsi ∈ Xs and xti ∈ Xt that are projected           We use different strategies for the binary classifi-
into the shared space o. The negative correlation          cation output, but all have in common that they use
is defined as follows:                                     continuous scores. The continuous score for each
                   n
                   X                                       target word is computed as the cosine similarity
      argmin   −         ρ(Ws→o xsi , Wt→o xti ) =         between the two vectors from the earlier and later
 Ws→o ,Wt→o        i=1
                                                     (2)   corpus.
        n
        X      cov(Ws→o xsi , Wt→o xti )                      In the case of the binary strategy, we assume
      −     p
        i=1
             var(Ws→o xsi ) × var(Wt→o xti )               a threshold t for which the target words with a
                                                           continuous score greater than t changed meaning
where cov is the covariance, var is the variance           and words with the score lower than t did not. We
and n is the number of vectors used for comput-            know that this assumption is generally wrong (be-
ing the transformation. In our implementation of           cause using the threshold, we introduce some error
CCA, the matrix X̂t is equal to the matrix Xt be-          into the classification), but we still believe it holds
cause it transforms only the source space s (ma-           for most cases and it is the best choice. To esti-
trix Xs ) into the target space t from the common          mate the threshold t, we used an approach called
shared space with a pseudo-inversion, and the tar-         binary-threshold (cca-bin and ort-bin in Table 1).
get space does not change. The matrix Ws→t for             For each target word wi we compute cosine simi-
                                                                                  s and vt , then we average
                                                           larity of its vectors vw
this transformation is then given by:                                               i      wi
                                                           these similarities for all words. The resulting av-
                                                           eraged3 value is used as the threshold.
             Ws→t = Ws→o (Wt→o )−1                   (3)
                                                           3.6    Ranking Strategy
   The submissions that use CCA are referred to as
cca-bin and cca-ranking in Table 1. The -bin and           The ranking strategy is the second approach for
-ranking parts refer to a strategy used for the final      generating a classification output (the submission
classification decision, see Section 3.5 and 3.6.          result cca-ranking in Table 1). It uses the mean
                                                           rank of repeated runs of each embedding pair. For
3.4     Orthogonal Transformation                          each run, the target words are scored with a cosine
                                                           distance. Then the distances for each embedding
In the case of the Orthogonal Transformation, the          pair are sorted and a rank-order is assigned to each
submission is referred to as ort-bin. We use Or-           target. The rank-orders are averaged, to get a mean
thogonal Transformation with a supervised seed             rank (and a standard deviation) for each target for
dictionary consisting of all words common to               each pair. Finally, ranks for all embedding pairs
both semantic spaces. The transformation matrix            are averaged. The composite rank is used, along
Ws→t is given by:                                          with an estimate of the associated cosine distance
                                                           and its corresponding angle, to divide the target
                     |V |
                     X                                     list into changed and unchanged sets. This does
            argmin           (Ws→t xsi − xti )2      (4)   not work well; there are competing gaps in rank
             Ws→t        i
                                                           and distance estimates.
                                                              We use the number of embeddings, and not the
under the hard condition that Ws→t needs to be
                                                           total number of runs, to compute the standard error
orthogonal, where V is the vocabulary of correct
                                                           of the mean (which is standard deviation divided
word translations from source space Xs to target
                                                           by the square root of samples).
space Xt and xsi ∈ Xs and xti ∈ Xt . The rea-
son for the orthogonality constraint is that linear        4     Experimental Setup
transformation with an orthogonal matrix does not
squeeze or re-scale the transformed space. It only         To obtain the semantic spaces, we employ Skip-
rotates the space, thus it preserves most of the re-       gram with negative sampling (Mikolov et al.,
lationships of its elements (in our case, it is impor-     2013). For the final submission, we trained the se-
tant that orthogonal transformation preserves an-          mantic spaces with 100 (the ort-bin submission)
gles between the words, so it preserves the cosine             3
                                                                 The ort-bin submission sets the threshold to be in the
similarity).                                               largest gap between the similarity values
and 150 (the cca-bin submission) dimensions for             5       Results
five iterations with five negative samples and win-
                                                            We submitted four different submissions. The ac-
dow size set to five. Each word has to appear at
                                                            curacy results for each submission are shown in
least five times in the corpus to be used in the train-
                                                            Table 1. The ort-bin system achieved the best
ing. To train the semantic space, we used the lem-
                                                            accuracy of 0.944 and ranked first4 among eight
matized corpora. The dimensions 100 and 150 are
                                                            other teams in the shared task, classifying 17 out
selected based on our previous experiences with
                                                            of 18 target words correctly. The cca-bin system
these methods (Pražák et al., 2020). Since we were
                                                            achieved an accuracy of 0.889 (16 correct classi-
able to submit four different submissions, we did
                                                            fications out of 18). After releasing the gold la-
not use the same dimension for both methods.
                                                            bels, we performed an additional experiment with
   The cca-ranking submission uses the same set-
                                                            the cca-bin system achieving also an accuracy of
tings and dimensions 100-105, 110-115, etc. up
                                                            0.944 when the same word embeddings (with em-
to 210-215, resulting in 72 different dimension
                                                            beddings dimension 100 instead of 150) are used
sizes. It combines 40 runs on each of 72 embed-
                                                            as for the ort-bin system. We found an opti-
ding pairs, a total of 2880 runs.
                                                            mal threshold for both systems, which makes them
   For the cca-bin submission, we build the trans-
                                                            classify all the words correctly5 .
lation dictionary for the transformation of the two
                                                               We believe that the key factor of the success of
spaces by removing the target words from the in-
                                                            our system is the sufficient size of the provided
tersection of their vocabularies. In the case of the
                                                            corpora. Thanks to that, we were able to train
cca-ranking submission, the dictionary in each
                                                            semantic spaces of good quality and thus achieve
run consists of up to 5000 randomly chosen com-
                                                            good results.
mon words for each semantic space.
   The random submission represents output that                               System         Accuracy
was generated completely randomly.                                            cca-bin          .889
                                                                              ort-bin          .944
                                                                              cca-ranking      .778
4.1       Corpus variants                                                     random           .500
The organizers provided the corpora already tok-
enized in four different versions: original tokens;                 Table 1: Results for our final submissions.
lemmatized tokens; original tokens with POS tag;
lemmatized tokens with POS tag. We experi-
                                                            6       Conclusion
mented with each of these variants, although in the
end, we used results based only on lemmas. Figure           Our systems based on Canonical Correlation
                                                            Analysis and Orthogonal Transformation achieved
      4
                                          rawTokens         the best accuracy of 0.944 in the shared task and
                                            lemmas
  3.5                                rawTokens+POS
                                       lemmas+POS
                                                            ranked first among eight other teams. We showed
      3
                                                            that our approach is a suitable solution for the Lex-
  2.5                                                       ical Semantic Change detection task. Applying a
      2                                                     threshold to semantic distance is a sensible archi-
  1.5                                                       tecture for detecting the binary semantic change
      1                                                     in target words between two corpora. Our binary-
  0.5                                                       threshold strategy succeeded quite well.
      0                                                        This task provided plenty of text to build good
       100    120    140    160     180       200     220
                                                            word embeddings. Corpora with much smaller
                                                            amounts of data might have increased the ran-
Figure 1: Standard deviation (of rank) versus em-           dom variation between the earlier and later embed-
bedding size for four versions of the corpora.              dings, which would have given our method prob-
                                                            lems. A flaw in our technique is that semantic vec-
                                                                4
1 shows the mean standard deviation of rank for                   We share the first place with another team that achieved
target words over forty runs for each of 72 differ-         the same accuracy.
                                                                5
                                                                  That is, 100% accuracy was possible with the continuous
ent embedding sizes. The most consistent variant            scores of both methods if we only had an oracle to set the
is the lemmas only.                                         threshold.
 tors are based on all senses of a word in the corpus.        method for fully unsupervised cross-lingual map-
 We do not yet have tools to tease out what kinds of          pings of word embeddings. In Proceedings of the
                                                              56th Annual Meeting of the Association for Compu-
 changes are implied by a particular semantic dis-
                                                              tational Linguistics (Volume 1: Long Papers), pages
 tance between vectors. We considered using the               789–798, Melbourne, Australia, July. Association
 part of speech data in the corpora since different           for Computational Linguistics.
 parts of speech for the same lemma are likely dif-
                                                          [Bamman and Crane2011] David Bamman and Gre-
 ferent senses. But placing the POS in the token,            gory Crane. 2011. Measuring historical word sense
 like using inflections instead of lemmas, results in        variation. In Proceedings of the 11th Annual Inter-
 many more, less well-trained semantic vectors, as           national ACM/IEEE Joint Conference on Digital Li-
 suggested by Figure 1.                                      braries, JCDL ’11, page 1–10, New York, NY, USA.
                                                             Association for Computing Machinery.
 Acknowledgements                                         [Basile et al.2020a] Pierpaolo Basile, Annalina Caputo,
                                                             Tommaso Caselli, Pierluigi Cassotti, and Rossella
 This work has been partly supported by ERDF
                                                             Varvara. 2020a. DIACR-Ita @ EVALITA2020:
 ”Research and Development of Intelligent                    Overview of the EVALITA2020 Diachronic Lexi-
 Components of Advanced Technologies for                     cal Semantics (DIACR-Ita) Task. In Valerio Basile,
 the Pilsen Metropolitan Area (InteCom)” (no.:               Danilo Croce, Maria Di Maro, and Lucia C. Passaro,
 CZ.02.1.01/0.0/0.0/17 048/0007267); by the                  editors, Proceedings of the 7th evaluation campaign
                                                             of Natural Language Processing and Speech tools
 project LO1506 of the Czech Ministry of Ed-                 for Italian (EVALITA 2020), Online. CEUR.org.
 ucation, Youth and Sports; and by Grant No.
 SGS-2019-018 Processing of heterogeneous                 [Basile et al.2020b] Valerio Basile, Danilo Croce,
                                                             Maria Di Maro, and Lucia C. Passaro. 2020b.
 data and its specialized applications. Access               Evalita 2020: Overview of the 7th evaluation cam-
 to computing and storage facilities owned by                paign of natural language processing and speech
 parties and projects contributing to the National           tools for italian. In Valerio Basile, Danilo Croce,
 Grid Infrastructure MetaCentrum provided under              Maria Di Maro, and Lucia C. Passaro, editors,
 the programme ”Projects of Large Research,                  Proceedings of Seventh Evaluation Campaign of
                                                             Natural Language Processing and Speech Tools for
 Development, and Innovations Infrastructures”               Italian. Final Workshop (EVALITA 2020), Online.
 (CESNET LM2015042), is greatly appreciated.                 CEUR.org.

                                                          [Brychcı́n et al.2019] Tomáš Brychcı́n, Stephen Taylor,
 References                                                   and Lukáš Svoboda. 2019. Cross-lingual word
                                                              analogies using linear transformations between se-
[Artetxe et al.2016] Mikel Artetxe, Gorka Labaka, and         mantic spaces. Expert Systems with Applications,
    Eneko Agirre. 2016. Learning principled bilin-            135:287–295.
    gual mappings of word embeddings while preserv-
    ing monolingual invariance. In Proceedings of the     [Cook et al.2014] Paul Cook, Jey Han Lau, Diana Mc-
    2016 Conference on Empirical Methods in Natu-            Carthy, and Timothy Baldwin. 2014. Novel word-
    ral Language Processing, pages 2289–2294, Austin,        sense identification. In Proceedings of COLING
    Texas, November. Association for Computational           2014, the 25th International Conference on Compu-
    Linguistics.                                             tational Linguistics: Technical Papers, pages 1624–
                                                             1635, Dublin, Ireland, August. Dublin City Univer-
[Artetxe et al.2017] Mikel Artetxe, Gorka Labaka, and        sity and Association for Computational Linguistics.
    Eneko Agirre. 2017. Learning bilingual word em-
    beddings with (almost) no bilingual data. In Pro-     [Eger and Mehler2016] Steffen Eger and Alexander
    ceedings of the 55th Annual Meeting of the Associa-      Mehler. 2016. On the linearity of semantic change:
    tion for Computational Linguistics (Volume 1: Long       Investigating meaning variation via dynamic graph
    Papers), pages 451–462, Vancouver, Canada, July.         models. In Proceedings of the 54th Annual Meet-
    Association for Computational Linguistics.               ing of the Association for Computational Linguistics
                                                             (Volume 2: Short Papers), pages 52–58, Berlin, Ger-
[Artetxe et al.2018a] Mikel Artetxe, Gorka Labaka, ,         many, August. Association for Computational Lin-
    and Eneko Agirre. 2018a. Generalizing and im-            guistics.
    proving bilingual word embedding mappings with a
    multi-step framework of linear transformations. In    [Frermann and Lapata2016] Lea Frermann and Mirella
    Proceedings of the Thirty-Second AAAI Conference          Lapata. 2016. A Bayesian model of diachronic
    on Artificial Intelligence (AAAI-18), pages 5012–         meaning change. Transactions of the Association
    5019.                                                     for Computational Linguistics, 4:31–45.

[Artetxe et al.2018b] Mikel Artetxe, Gorka Labaka, and    [Gulordava and Baroni2011] Kristina Gulordava and
    Eneko Agirre. 2018b. A robust self-learning              Marco Baroni. 2011. A distributional similarity
   approach to the detection of semantic change in the        Volume 1 (Long Papers), pages 474–484, New Or-
   Google books ngram corpus. In Proceedings of the           leans, Louisiana, June. Association for Computa-
   GEMS 2011 Workshop on GEometrical Models of                tional Linguistics.
   Natural Language Semantics, pages 67–71, Edin-
   burgh, UK, July. Association for Computational Lin-     [Schlechtweg and Walde2020] Dominik Schlechtweg
   guistics.                                                   and Sabine Schulte im Walde. 2020. Simulating
                                                               lexical semantic change from sense-annotated
[Hamilton et al.2016a] William L. Hamilton, Jure               data. In A. Ravignani, C. Barbieri, M. Martins,
   Leskovec, and Dan Jurafsky. 2016a. Cultural shift           M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little,
   or linguistic drift? comparing two computational            K. Mudd, and T. Verhoef, editors, The Evolution of
   measures of semantic change. In Proceedings of the          Language: Proceedings of the 13th International
   2016 Conference on Empirical Methods in Natural             Conference (EvoLang13).
   Language Processing, pages 2116–2121, Austin,
   Texas, November. Association for Computational          [Schlechtweg et al.2017] Dominik Schlechtweg, Ste-
   Linguistics.                                                fanie Eckmann, Enrico Santus, Sabine Schulte im
                                                               Walde, and Daniel Hole. 2017. German in flux: De-
[Hamilton et al.2016b] William L. Hamilton, Jure               tecting metaphoric change via word entropy. In Pro-
   Leskovec, and Dan Jurafsky. 2016b. Diachronic               ceedings of the 21st Conference on Computational
   word embeddings reveal statistical laws of semantic         Natural Language Learning (CoNLL 2017), pages
   change. In Proceedings of the 54th Annual Meeting           354–367, Vancouver, Canada, August. Association
   of the Association for Computational Linguistics            for Computational Linguistics.
   (Volume 1: Long Papers), pages 1489–1501, Berlin,
                                                           [Schlechtweg et al.2018] Dominik       Schlechtweg,
   Germany, August. Association for Computational
                                                               Sabine Schulte im Wlade, and Stefanie Eckmann.
   Linguistics.
                                                               2018. Diachronic usage relatedness (durel): A
[Kutuzov et al.2018] Andrey Kutuzov, Lilja Øvrelid,            framework for the annotation of lexical semantic
   Terrence Szymanski, and Erik Velldal. 2018. Di-             change. In Proceedings of NAACL-HLT 2018,
   achronic word embeddings and semantic shifts: a             pages 169–174.
   survey. In Proceedings of the 27th International
                                                           [Schlechtweg et al.2019] Dominik Schlechtweg, Anna
   Conference on Computational Linguistics, pages
                                                               Hätty, Marco Del Tredici, and Sabine Schulte im
   1384–1397, Santa Fe, New Mexico, USA, August.
                                                               Walde. 2019. A wind of change: Detecting and
   Association for Computational Linguistics.
                                                               evaluating lexical semantic change across times and
[Mihalcea and Nastase2012] Rada Mihalcea and Vivi              domains. In Proceedings of the 57th Annual Meet-
   Nastase. 2012. Word epoch disambiguation: Find-             ing of the Association for Computational Linguis-
   ing how words change over time. In Proceedings of           tics, pages 732–746, Florence, Italy, July. Associa-
   the 50th Annual Meeting of the Association for Com-         tion for Computational Linguistics.
   putational Linguistics (Volume 2: Short Papers),        [Schlechtweg et al.2020] Dominik Schlechtweg, Bar-
   pages 259–263, Jeju Island, Korea, July. Association        bara McGillivray, Simon Hengchen, Haim Du-
   for Computational Linguistics.                              bossarsky, and Nina Tahmasebi.         2020.  Se-
[Mikolov et al.2013] Tomas Mikolov, Kai Chen, Greg             mEval 2020 Task 1: Unsupervised Lexical Se-
   Corrado, and Jeffrey Dean. 2013. Efficient estima-          mantic Change Detection. In Proceedings of the
   tion of word representations in vector space. In Pro-       14th International Workshop on Semantic Evalua-
   ceedings of workshop at ICLR. arXiv1301.3781.               tion (SemEval-2020), Barcelona, Spain, Sep. Asso-
                                                               ciation for Computational Linguistics.
[Mitra et al.2015] Sunny Mitra,    Ritwik Mitra,
   Suman Kalyan Maity, Martin Riedl, Chris Bie-            [Straka2018] Milan Straka. 2018. UDPipe 2.0 proto-
   mann, Pawan Goyal, and Animesh Mukherjee.                   type at CoNLL 2018 UD shared task. In Proceed-
   2015. An automatic approach to identify word                ings of the CoNLL 2018 Shared Task: Multilingual
   sense changes in text media across timescales.              Parsing from Raw Text to Universal Dependencies,
   Natural Language Engineering, 21(5):773–798.                pages 197–207, Brussels, Belgium, October. Asso-
                                                               ciation for Computational Linguistics.
[Pražák et al.2020] Ondřej Pražák, Pavel Přibáň,
    Stephen Taylor, and Jakub Sido. 2020. Uwb at           [Tahmasebi and Risse2017] Nina       Tahmasebi    and
    semeval-2020 task 1: Lexical semantic change               Thomas Risse. 2017. Finding individual word
    detection. In Proceedings of the 14th International        sense changes and their delay in appearance. In
    Workshop on Semantic Evaluation (SemEval-                  Proceedings of the International Conference Recent
    2020), Barcelona, Spain, Sep. Association for              Advances in Natural Language Processing, RANLP
    Computational Linguistics.                                 2017, pages 741–749, Varna, Bulgaria, September.
                                                               INCOMA Ltd.
[Rosenfeld and Erk2018] Alex Rosenfeld and Katrin
                                                           [Tahmasebi et al.2018] Nina Tahmasebi, Lars Borin,
   Erk. 2018. Deep neural models of semantic shift.
                                                               and Adam Jatowt. 2018. Survey of computa-
   In Proceedings of the 2018 Conference of the North
                                                               tional approaches to lexical semantic change. arXiv
   American Chapter of the Association for Computa-
                                                               preprint arXiv:1811.06278.
   tional Linguistics: Human Language Technologies,