<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ondrˇej Prazˇ a´k</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Prˇiba´nˇ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Taylor</string-name>
          <email>taylorg@kiv.zcu.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NTIS - New Technologies for the Information Society</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe our method for detection of lexical semantic change (i.e., word sense changes over time) for the DIACR-Ita shared task, where we ranked 1st. We examine semantic differences between specific words in two Italian corpora, chosen from different time periods. Our method is fully unsupervised and language independent. It consists of preparing a semantic vector space for each corpus, earlier and later. Then we compute a linear transformation between earlier and later spaces, using CCA and Orthogonal Transformation. Finally, we measure the cosines between the transformed vectors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Language evolves with time. New words appear,
old words fall out of use, and the meanings of
some words shift. There are changes in topics,
syntax, and presentation structure. Reading the
natural philosophy musings of aristocratic
amateurs from the eighteenth century, and comparing
with a monograph from the nineteenth century, or
a medical study from the twentieth century, we can
observe differences in many dimensions, some of
which need a deep historical background to study.
Changes in word senses are both a visible and a
tractable part of language evolution.</p>
      <p>
        Computational methods for researching the
stories of words have the potential of helping us
understand this small corner of linguistic
evolution. The tools for measuring these diachronic
semantic shifts might also be useful for
measuring whether the same word is used in different
ways in synchronic documents. The task of
finding word sense changes over time is called
di*Equal contribution. Copyright © 2020 for this paper by its
authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
achronic Lexical Semantic Change (LSC)
detection. The task is getting more attention in
recent years
        <xref ref-type="bibr" rid="ref1 ref13 ref14 ref19 ref2 ref21 ref22 ref25 ref6 ref7">(Hamilton et al., 2016b; Schlechtweg
et al., 2017; Schlechtweg et al., 2020)</xref>
        . There is
also the synchronic LSC task, which aims to
identify domain-specific changes of word senses
compared to general-language usage
        <xref ref-type="bibr" rid="ref24 ref8">(Schlechtweg et
al., 2019)</xref>
        .
1.1
      </p>
      <sec id="sec-1-1">
        <title>Related Work</title>
        <p>
          Tahmasebi et al. (2018) provide a comprehensive
survey of techniques for the LSC task, as do
Kutuzov et al. (2018). Schlechtweg et al. (2019)
evaluate available approaches for LSC detection
using the DURel dataset
          <xref ref-type="bibr" rid="ref12 ref15 ref23 ref28 ref3">(Schlechtweg et al., 2018)</xref>
          .
Schlechtweg et al. (2020) present results of the
first shared task that addresses the LSC problem
and provide an evaluation dataset that was
manually annotated for four languages.
        </p>
        <p>
          According to Schlechtweg et al. (2019), there
are three main types of approaches. (1) Semantic
vector spaces approaches
          <xref ref-type="bibr" rid="ref1 ref1 ref10 ref11 ref13 ref13 ref14 ref14 ref19 ref20 ref25 ref5 ref6 ref7">(Gulordava and Baroni,
2011; Eger and Mehler, 2016; Hamilton et al.,
2016a; Hamilton et al., 2016b; Rosenfeld and Erk,
2018; Prazˇa´k et al., 2020)</xref>
          represent each word
with two vectors for two different time periods.
The change of meaning is then measured by some
distance (usually by the cosine distance) between
the two vectors. (2) Topic modeling approaches
          <xref ref-type="bibr" rid="ref10 ref11 ref16 ref21 ref25 ref5 ref9">(Bamman and Crane, 2011; Mihalcea and Nastase,
2012; Cook et al., 2014; Frermann and Lapata,
2016; Schlechtweg and Walde, 2020)</xref>
          estimate a
probability distribution of words over their
different senses, i.e., topics and (3) Clustering models
          <xref ref-type="bibr" rid="ref18 ref27">(Mitra et al., 2015; Tahmasebi and Risse, 2017)</xref>
          .
1.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>The DIACR-Ita task</title>
        <p>
          The goal of the DIACR-Ita task
          <xref ref-type="bibr" rid="ref19 ref19 ref25 ref25 ref6 ref6 ref7 ref7">(Basile et al.,
2020a; Basile et al., 2020b)</xref>
          is to establish if a set
of Italian words (target words) change their
meaning from time period t1 to time period t2 (i.e.,
binary classification task). The organizers provide
corresponding corpora C1 and C2 and a list of
target words. Only these inputs may be used to
train systems, which judge for each target word,
whether it is changed or not. The task is the same
as the binary sub-task of the SemEval-2020 Task
1
          <xref ref-type="bibr" rid="ref19 ref21 ref25 ref6 ref7">(Schlechtweg et al., 2020)</xref>
          competition.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>The DIACR-Ita data consists of many randomly
ordered text samples that have no relationship to
each other. Most of the text samples are complete
sentences, but some are sentence fragments.</p>
      <p>
        The ‘early’ corpus, C1 has about 2.4 million text
samples and 52 million tokens; the ‘later’ corpus,
C2 has about 7.8 million text samples and 738
million tokens. Each token is given in the corpora
with its part-of-speech tag and lemma. The
target word list consists of 18 lemmas. The POS and
lemmas of the corpora are generated with the
UDPipe
        <xref ref-type="bibr" rid="ref26">(Straka, 2018)</xref>
        model ISDT-UD v2.5, which
has an error rate of about 2%.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>System Description</title>
      <sec id="sec-3-1">
        <title>Overview</title>
        <p>Because language is evolving, expressions, words,
and sentence constructions in two corpora from
different time periods about the same topic will
be written in languages that are quite similar but
slightly different. They will share the
majority of their words, grammar, and syntax. We
can observe a similar situation in languages from
the same family, such as Italian-Spanish in
Romance languages or Czech-Slovak in Slavic
languages. These pairs of languages share a lot of
common words, expressions and syntax. For some
pairs, native speakers can understand and
sometimes even actively communicate through a (low)
language barrier.</p>
        <p>
          Our system follows the approach from
          <xref ref-type="bibr" rid="ref19 ref25 ref6 ref7">(Prazˇa´k
et al., 2020)</xref>
          1. The main idea behind our solution
is that we treat each pair of corpora C1 and C2
as different languages L1 and L2 even though the
text from both corpora is written in Italian. We
believe that these two languages L1 and L2 will
be extremely similar in all aspects, including
semantic. We train a separate semantic space for
each corpus, and subsequently, we map these two
spaces into one common cross-lingual space. We
use methods for cross-lingual mapping
          <xref ref-type="bibr" rid="ref1 ref12 ref12 ref13 ref14 ref15 ref15 ref2 ref22 ref23 ref23 ref24 ref28 ref28 ref3 ref3 ref8">(Brychc´ın
et al., 2019; Artetxe et al., 2016; Artetxe et al.,
2017; Artetxe et al., 2018a; Artetxe et al., 2018b)</xref>
          and thanks to the large similarity between L1 and
L2 the quality of transformation should be high.
We compute cosine similarity of the transformed
word vectors to classify whether the target words
changed their sense.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Semantic Space Transformation</title>
        <p>
          First, we train two semantic spaces from corpus
C1 and C2. We represent the semantic spaces by a
matrix Xs (i.e., a source space s) and a matrix Xt
(i.e., a target space t)2 using word2vec Skip-gram
with negative sampling
          <xref ref-type="bibr" rid="ref17">(Mikolov et al., 2013)</xref>
          . We
perform a cross-lingual mapping of the two
vector spaces, getting two matrices X^ s and X^ t
projected into a shared space. We select two
methods for the cross-lingual mapping Canonical
Correlation Analysis (CCA) using the implementation
from
          <xref ref-type="bibr" rid="ref24 ref8">(Brychc´ın et al., 2019)</xref>
          and a modification
of the Orthogonal Transformation from VecMap
          <xref ref-type="bibr" rid="ref12 ref15 ref23 ref28 ref3">(Artetxe et al., 2018b)</xref>
          . Both of these methods are
linear transformations. The transformations can
be written as follows:
        </p>
        <p>X^ s = Ws!tXs
(1)
where Ws!t is a matrix that performs linear
transformation from the source space s (matrix
Xs) into a target space t and X^ s is the source space
transformed into the target space t (the matrix Xt
does not have to be transformed because Xt is
already in the target space t and Xt = X^ t).</p>
        <p>Finally, in all transformation methods, for each
word wi from the set of target words T , we
select its corresponding vectors vwsi and vwti from
matrices X^ s and X^ t, respectively (vwsi 2 X^ s and
vwti 2 X^ t), and we compute cosine similarity
between these two vectors. The cosine similarity is
then used to generate a final classification output
using different strategies, see Section 3.5 and 3.6.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Canonical Correlation Analysis</title>
        <p>Generally, the CCA transformation transforms
both spaces Xs and Xt into a third shared space
o (where Xs 6= X^ s and Xt 6= X^ t). Thus, CCA
computes two transformation matrices Ws!o for
the source space and Wt!o for the target space.
The transformation matrices are computed by
1The source code is available at https://github.
com/pauli31/SemEval2020-task1
2The source space Xs is created from the corpus C1 and
the target space Xt is created from the corpus C2.
minimizing the negative correlation between the
vectors xis 2 Xs and xit 2 Xt that are projected
into the shared space o. The negative correlation
is defined as follows:</p>
        <p>argmin
Ws!o;Wt!o</p>
        <p>n
X
i=1 pvar(Ws!oxis)</p>
        <p>n
X (Ws!oxis; Wt!oxit) =
i=1
cov(Ws!oxis; Wt!oxt)</p>
        <p>i
var(Wt!oxt)
i
(2)
where cov is the covariance, var is the variance
and n is the number of vectors used for
computing the transformation. In our implementation of
CCA, the matrix X^ t is equal to the matrix Xt
because it transforms only the source space s
(matrix Xs) into the target space t from the common
shared space with a pseudo-inversion, and the
target space does not change. The matrix Ws!t for
this transformation is then given by:</p>
        <p>Ws!t = Ws!o(Wt!o) 1
(3)</p>
        <p>The submissions that use CCA are referred to as
cca-bin and cca-ranking in Table 1. The -bin and
-ranking parts refer to a strategy used for the final
classification decision, see Section 3.5 and 3.6.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Orthogonal Transformation</title>
        <p>In the case of the Orthogonal Transformation, the
submission is referred to as ort-bin. We use
Orthogonal Transformation with a supervised seed
dictionary consisting of all words common to
both semantic spaces. The transformation matrix
Ws!t is given by:</p>
        <p>jV j
argmin X (Ws!txs
Ws!t i
i
xt)2
i
(4)
under the hard condition that Ws!t needs to be
orthogonal, where V is the vocabulary of correct
word translations from source space Xs to target
space Xt and xis 2 Xs and xit 2 Xt. The
reason for the orthogonality constraint is that linear
transformation with an orthogonal matrix does not
squeeze or re-scale the transformed space. It only
rotates the space, thus it preserves most of the
relationships of its elements (in our case, it is
important that orthogonal transformation preserves
angles between the words, so it preserves the cosine
similarity).
3.5
We use different strategies for the binary
classification output, but all have in common that they use
continuous scores. The continuous score for each
target word is computed as the cosine similarity
between the two vectors from the earlier and later
corpus.</p>
        <p>In the case of the binary strategy, we assume
a threshold t for which the target words with a
continuous score greater than t changed meaning
and words with the score lower than t did not. We
know that this assumption is generally wrong
(because using the threshold, we introduce some error
into the classification), but we still believe it holds
for most cases and it is the best choice. To
estimate the threshold t, we used an approach called
binary-threshold (cca-bin and ort-bin in Table 1).
For each target word wi we compute cosine
similarity of its vectors vwi and vwti , then we average
s
these similarities for all words. The resulting
averaged3 value is used as the threshold.
3.6</p>
      </sec>
      <sec id="sec-3-5">
        <title>Ranking Strategy</title>
        <p>The ranking strategy is the second approach for
generating a classification output (the submission
result cca-ranking in Table 1). It uses the mean
rank of repeated runs of each embedding pair. For
each run, the target words are scored with a cosine
distance. Then the distances for each embedding
pair are sorted and a rank-order is assigned to each
target. The rank-orders are averaged, to get a mean
rank (and a standard deviation) for each target for
each pair. Finally, ranks for all embedding pairs
are averaged. The composite rank is used, along
with an estimate of the associated cosine distance
and its corresponding angle, to divide the target
list into changed and unchanged sets. This does
not work well; there are competing gaps in rank
and distance estimates.</p>
        <p>We use the number of embeddings, and not the
total number of runs, to compute the standard error
of the mean (which is standard deviation divided
by the square root of samples).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>
        To obtain the semantic spaces, we employ
Skipgram with negative sampling
        <xref ref-type="bibr" rid="ref17">(Mikolov et al.,
2013)</xref>
        . For the final submission, we trained the
semantic spaces with 100 (the ort-bin submission)
3The ort-bin submission sets the threshold to be in the
largest gap between the similarity values
and 150 (the cca-bin submission) dimensions for
five iterations with five negative samples and
window size set to five. Each word has to appear at
least five times in the corpus to be used in the
training. To train the semantic space, we used the
lemmatized corpora. The dimensions 100 and 150 are
selected based on our previous experiences with
these methods
        <xref ref-type="bibr" rid="ref19 ref25 ref6 ref7">(Prazˇ a´k et al., 2020)</xref>
        . Since we were
able to submit four different submissions, we did
not use the same dimension for both methods.
      </p>
      <p>The cca-ranking submission uses the same
settings and dimensions 100-105, 110-115, etc. up
to 210-215, resulting in 72 different dimension
sizes. It combines 40 runs on each of 72
embedding pairs, a total of 2880 runs.</p>
      <p>For the cca-bin submission, we build the
translation dictionary for the transformation of the two
spaces by removing the target words from the
intersection of their vocabularies. In the case of the
cca-ranking submission, the dictionary in each
run consists of up to 5000 randomly chosen
common words for each semantic space.</p>
      <p>The random submission represents output that
was generated completely randomly.
4.1</p>
      <sec id="sec-4-1">
        <title>Corpus variants</title>
        <p>The organizers provided the corpora already
tokenized in four different versions: original tokens;
lemmatized tokens; original tokens with POS tag;
lemmatized tokens with POS tag. We
experimented with each of these variants, although in the
end, we used results based only on lemmas. Figure
4
3.5
3
2.5
2
1.5
1
0.5
0100
rawTokens</p>
        <p>lemmas
rawTokens+POS</p>
        <p>lemmas+POS
120
140
160
180
200
220
1 shows the mean standard deviation of rank for
target words over forty runs for each of 72
different embedding sizes. The most consistent variant
is the lemmas only.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>We submitted four different submissions. The
accuracy results for each submission are shown in
Table 1. The ort-bin system achieved the best
accuracy of 0:944 and ranked first4 among eight
other teams in the shared task, classifying 17 out
of 18 target words correctly. The cca-bin system
achieved an accuracy of 0:889 (16 correct
classifications out of 18). After releasing the gold
labels, we performed an additional experiment with
the cca-bin system achieving also an accuracy of
0:944 when the same word embeddings (with
embeddings dimension 100 instead of 150) are used
as for the ort-bin system. We found an
optimal threshold for both systems, which makes them
classify all the words correctly5.</p>
      <p>We believe that the key factor of the success of
our system is the sufficient size of the provided
corpora. Thanks to that, we were able to train
semantic spaces of good quality and thus achieve
good results.</p>
      <p>System
cca-bin
ort-bin
cca-ranking
random</p>
      <p>Accuracy
.889
.944
.778
.500
Our systems based on Canonical Correlation
Analysis and Orthogonal Transformation achieved
the best accuracy of 0.944 in the shared task and
ranked first among eight other teams. We showed
that our approach is a suitable solution for the
Lexical Semantic Change detection task. Applying a
threshold to semantic distance is a sensible
architecture for detecting the binary semantic change
in target words between two corpora. Our
binarythreshold strategy succeeded quite well.</p>
      <p>This task provided plenty of text to build good
word embeddings. Corpora with much smaller
amounts of data might have increased the
random variation between the earlier and later
embeddings, which would have given our method
problems. A flaw in our technique is that semantic
vec4We share the first place with another team that achieved
the same accuracy.</p>
      <p>5That is, 100% accuracy was possible with the continuous
scores of both methods if we only had an oracle to set the
threshold.
tors are based on all senses of a word in the corpus.
We do not yet have tools to tease out what kinds of
changes are implied by a particular semantic
distance between vectors. We considered using the
part of speech data in the corpora since different
parts of speech for the same lemma are likely
different senses. But placing the POS in the token,
like using inflections instead of lemmas, results in
many more, less well-trained semantic vectors, as
suggested by Figure 1.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been partly supported by ERDF
”Research and Development of Intelligent
Components of Advanced Technologies for
the Pilsen Metropolitan Area (InteCom)” (no.:
CZ.02.1.01/0.0/0.0/17 048/0007267); by the
project LO1506 of the Czech Ministry of
Education, Youth and Sports; and by Grant No.
SGS-2019-018 Processing of heterogeneous
data and its specialized applications. Access
to computing and storage facilities owned by
parties and projects contributing to the National
Grid Infrastructure MetaCentrum provided under
the programme ”Projects of Large Research,
Development, and Innovations Infrastructures”
(CESNET LM2015042), is greatly appreciated.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Artetxe et al.2016]
          <string-name>
            <given-names>Mikel</given-names>
            <surname>Artetxe</surname>
          </string-name>
          , Gorka Labaka, and
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Learning principled bilingual mappings of word embeddings while preserving monolingual invariance</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>2289</fpage>
          -
          <lpage>2294</lpage>
          , Austin, Texas, November. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Artetxe et al.2017]
          <string-name>
            <given-names>Mikel</given-names>
            <surname>Artetxe</surname>
          </string-name>
          , Gorka Labaka, and
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Learning bilingual word embeddings with (almost) no bilingual data</article-title>
          .
          <source>In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          , Vancouver, Canada, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Artetxe et al.2018a]
          <string-name>
            <surname>Mikel</surname>
            <given-names>Artetxe</given-names>
          </string-name>
          , Gorka Labaka, , and
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          . 2018a.
          <article-title>Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations</article-title>
          .
          <source>In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)</source>
          , pages
          <fpage>5012</fpage>
          -
          <lpage>5019</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>method for fully unsupervised cross-lingual mappings of word embeddings</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>789</fpage>
          -
          <lpage>798</lpage>
          , Melbourne, Australia, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Bamman and Crane2011] David Bamman and Gregory Crane</source>
          .
          <year>2011</year>
          .
          <article-title>Measuring historical word sense variation</article-title>
          .
          <source>In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL '11, page 1-10</source>
          , New York, NY, USA. Association for Computing Machinery.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Basile et al.2020a]
          <string-name>
            <surname>Pierpaolo</surname>
            <given-names>Basile</given-names>
          </string-name>
          , Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, and
          <string-name>
            <given-names>Rossella</given-names>
            <surname>Varvara</surname>
          </string-name>
          .
          <year>2020a</year>
          .
          <article-title>DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Basile et al.2020b]
          <string-name>
            <surname>Valerio</surname>
            <given-names>Basile</given-names>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          . 2020b.
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Brychc´ın et al.2019]
          <article-title>Toma´sˇ Brychc´ın, Stephen Taylor</article-title>
          , and Luka´sˇ Svoboda.
          <year>2019</year>
          .
          <article-title>Cross-lingual word analogies using linear transformations between semantic spaces</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>135</volume>
          :
          <fpage>287</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Cook et al.2014]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Cook</surname>
          </string-name>
          , Jey Han Lau,
          <string-name>
            <surname>Diana McCarthy</surname>
            ,
            <given-names>and Timothy</given-names>
          </string-name>
          <string-name>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Novel wordsense identification</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          , pages
          <fpage>1624</fpage>
          -
          <lpage>1635</lpage>
          , Dublin, Ireland,
          <year>August</year>
          . Dublin City University and Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Eger and Mehler2016] Steffen Eger and Alexander Mehler</source>
          .
          <year>2016</year>
          .
          <article-title>On the linearity of semantic change: Investigating meaning variation via dynamic graph models</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          , pages
          <fpage>52</fpage>
          -
          <lpage>58</lpage>
          , Berlin, Germany, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Frermann and Lapata2016] Lea Frermann and Mirella Lapata</source>
          .
          <year>2016</year>
          .
          <article-title>A Bayesian model of diachronic meaning change</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>4</volume>
          :
          <fpage>31</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Artetxe et al.2018b]
          <string-name>
            <surname>Mikel</surname>
            <given-names>Artetxe</given-names>
          </string-name>
          , Gorka Labaka, and
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          .
          <year>2018b</year>
          .
          <article-title>A robust self-learning [Gulordava and Baroni2011] Kristina Gulordava</article-title>
          and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A distributional similarity approach to the detection of semantic change in the Google books ngram corpus</article-title>
          .
          <source>In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>71</lpage>
          , Edinburgh,
          <string-name>
            <surname>UK</surname>
          </string-name>
          , July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Hamilton et al.2016a]
          <string-name>
            <surname>William L. Hamilton</surname>
            , Jure Leskovec, and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          . 2016a.
          <article-title>Cultural shift or linguistic drift? comparing two computational measures of semantic change</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>2116</fpage>
          -
          <lpage>2121</lpage>
          , Austin, Texas, November. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Hamilton et al.2016b]
          <string-name>
            <surname>William L. Hamilton</surname>
            , Jure Leskovec, and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          . 2016b.
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1489</fpage>
          -
          <lpage>1501</lpage>
          , Berlin, Germany, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Kutuzov et al.2018]
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          , Lilja Øvrelid, Terrence Szymanski, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Velldal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic word embeddings and semantic shifts: a survey</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1384</fpage>
          -
          <lpage>1397</lpage>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA,
          <year>August</year>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Mihalcea and Nastase2012] Rada Mihalcea and Vivi Nastase</source>
          .
          <year>2012</year>
          .
          <article-title>Word epoch disambiguation: Finding how words change over time</article-title>
          .
          <source>In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          , pages
          <fpage>259</fpage>
          -
          <lpage>263</lpage>
          ,
          <string-name>
            <surname>Jeju</surname>
            <given-names>Island</given-names>
          </string-name>
          , Korea, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Mikolov et al.2013]
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>In Proceedings of workshop at ICLR</source>
          .
          <year>arXiv1301</year>
          .
          <fpage>3781</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Mitra et al.2015]
          <string-name>
            <given-names>Sunny</given-names>
            <surname>Mitra</surname>
          </string-name>
          , Ritwik Mitra, Suman Kalyan Maity, Martin Riedl, Chris Biemann, Pawan Goyal, and
          <string-name>
            <given-names>Animesh</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>An automatic approach to identify word sense changes in text media across timescales</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>21</volume>
          (
          <issue>5</issue>
          ):
          <fpage>773</fpage>
          -
          <lpage>798</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>[Prazˇa´</article-title>
          k et al.
          <year>2020</year>
          ]
          <article-title>Ondrˇej Prazˇa´</article-title>
          k, Pavel Prˇiba´nˇ,
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Taylor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jakub</given-names>
            <surname>Sido</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Uwb at semeval-2020 task 1: Lexical semantic change detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval2020)</source>
          , Barcelona, Spain, Sep. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Rosenfeld and Erk2018] Alex Rosenfeld and Katrin Erk</source>
          .
          <year>2018</year>
          .
          <article-title>Deep neural models of semantic shift</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <issue>Long Papers)</issue>
          , pages
          <fpage>474</fpage>
          -
          <lpage>484</lpage>
          , New Orleans, Louisiana, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Schlechtweg and Walde2020] Dominik Schlechtweg and Sabine Schulte im Walde</source>
          .
          <year>2020</year>
          .
          <article-title>Simulating lexical semantic change from sense-annotated data</article-title>
          . In A. Ravignani,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Flaherty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jadoul</surname>
          </string-name>
          , E. Lattenkamp,
          <string-name>
            <given-names>H.</given-names>
            <surname>Little</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mudd</surname>
          </string-name>
          , and T. Verhoef, editors,
          <source>The Evolution of Language: Proceedings of the 13th International Conference (EvoLang13).</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Schlechtweg et al.2017]
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Hole</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>German in flux: Detecting metaphoric change via word entropy</article-title>
          .
          <source>In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL</source>
          <year>2017</year>
          ), pages
          <fpage>354</fpage>
          -
          <lpage>367</lpage>
          , Vancouver, Canada, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Schlechtweg et al.2018]
          <article-title>Dominik Schlechtweg, Sabine Schulte im Wlade, and</article-title>
          <string-name>
            <given-names>Stefanie</given-names>
            <surname>Eckmann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic usage relatedness (durel): A framework for the annotation of lexical semantic change</article-title>
          .
          <source>In Proceedings of NAACL-HLT</source>
          <year>2018</year>
          , pages
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Schlechtweg et al.2019]
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Anna Ha¨tty,
          <source>Marco Del Tredici, and Sabine Schulte im Walde</source>
          .
          <year>2019</year>
          .
          <article-title>A wind of change: Detecting and evaluating lexical semantic change across times and domains</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>732</fpage>
          -
          <lpage>746</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Schlechtweg et al.2020]
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
            <given-names>McGillivray</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Simon</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , Haim Dubossarsky, and
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection</article-title>
          .
          <source>In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020)</source>
          , Barcelona, Spain, Sep. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Straka2018]
          <string-name>
            <given-names>Milan</given-names>
            <surname>Straka</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>UDPipe 2.0 prototype at CoNLL 2018 UD shared task</article-title>
          .
          <source>In Proceedings of the CoNLL</source>
          <year>2018</year>
          <article-title>Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          , pages
          <fpage>197</fpage>
          -
          <lpage>207</lpage>
          , Brussels, Belgium, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>[Tahmasebi and Risse2017] Nina Tahmasebi and Thomas Risse</source>
          .
          <year>2017</year>
          .
          <article-title>Finding individual word sense changes and their delay in appearance</article-title>
          .
          <source>In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017</source>
          , pages
          <fpage>741</fpage>
          -
          <lpage>749</lpage>
          , Varna, Bulgaria, September. INCOMA Ltd.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Tahmasebi et al.2018]
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , Lars Borin, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Survey of computational approaches to lexical semantic change</article-title>
          . arXiv preprint arXiv:
          <year>1811</year>
          .06278.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>