=Paper=
{{Paper
|id=Vol-2765/110
|storemode=property
|title=UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation
|pdfUrl=https://ceur-ws.org/Vol-2765/paper110.pdf
|volume=Vol-2765
|authors=Ondřej Pražák,Pavel Přibáň,Stephen Taylor
|dblpUrl=https://dblp.org/rec/conf/evalita/PrazakP020
}}
==UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation==
UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and
Orthogonal Transformation
Ondřej Pražák* 1,2 , Pavel Přibáň* 1,2 , and Stephen Taylor* 2
1 NTIS – New Technologies for the Information Society,
2 Department of Computer Science and Engineering,
Faculty of Applied Sciences, University of West Bohemia, Czech Republic
{ondfa, pribanp, taylor}@kiv.zcu.cz
http://nlp.kiv.zcu.cz
Abstract achronic Lexical Semantic Change (LSC) detec-
tion. The task is getting more attention in re-
In this paper, we describe our method for cent years (Hamilton et al., 2016b; Schlechtweg
detection of lexical semantic change (i.e., et al., 2017; Schlechtweg et al., 2020). There is
word sense changes over time) for the also the synchronic LSC task, which aims to iden-
DIACR-Ita shared task, where we ranked tify domain-specific changes of word senses com-
1st . We examine semantic differences be- pared to general-language usage (Schlechtweg et
tween specific words in two Italian cor- al., 2019).
pora, chosen from different time periods.
Our method is fully unsupervised and lan- 1.1 Related Work
guage independent. It consists of prepar-
Tahmasebi et al. (2018) provide a comprehensive
ing a semantic vector space for each cor-
survey of techniques for the LSC task, as do Kutu-
pus, earlier and later. Then we compute a
zov et al. (2018). Schlechtweg et al. (2019) eval-
linear transformation between earlier and
uate available approaches for LSC detection us-
later spaces, using CCA and Orthogonal
ing the DURel dataset (Schlechtweg et al., 2018).
Transformation. Finally, we measure the
Schlechtweg et al. (2020) present results of the
cosines between the transformed vectors.
first shared task that addresses the LSC problem
1 Introduction and provide an evaluation dataset that was manu-
Language evolves with time. New words appear, ally annotated for four languages.
old words fall out of use, and the meanings of According to Schlechtweg et al. (2019), there
some words shift. There are changes in topics, are three main types of approaches. (1) Semantic
syntax, and presentation structure. Reading the vector spaces approaches (Gulordava and Baroni,
natural philosophy musings of aristocratic ama- 2011; Eger and Mehler, 2016; Hamilton et al.,
teurs from the eighteenth century, and comparing 2016a; Hamilton et al., 2016b; Rosenfeld and Erk,
with a monograph from the nineteenth century, or 2018; Pražák et al., 2020) represent each word
a medical study from the twentieth century, we can with two vectors for two different time periods.
observe differences in many dimensions, some of The change of meaning is then measured by some
which need a deep historical background to study. distance (usually by the cosine distance) between
Changes in word senses are both a visible and a the two vectors. (2) Topic modeling approaches
tractable part of language evolution. (Bamman and Crane, 2011; Mihalcea and Nastase,
Computational methods for researching the sto- 2012; Cook et al., 2014; Frermann and Lapata,
ries of words have the potential of helping us 2016; Schlechtweg and Walde, 2020) estimate a
understand this small corner of linguistic evolu- probability distribution of words over their differ-
tion. The tools for measuring these diachronic ent senses, i.e., topics and (3) Clustering models
semantic shifts might also be useful for measur- (Mitra et al., 2015; Tahmasebi and Risse, 2017).
ing whether the same word is used in different
1.2 The DIACR-Ita task
ways in synchronic documents. The task of find-
ing word sense changes over time is called di- The goal of the DIACR-Ita task (Basile et al.,
*
2020a; Basile et al., 2020b) is to establish if a set
Equal contribution. Copyright © 2020 for this paper by its
authors. Use permitted under Creative Commons License At- of Italian words (target words) change their mean-
tribution 4.0 International (CC BY 4.0). ing from time period t1 to time period t2 (i.e., bi-
nary classification task). The organizers provide use methods for cross-lingual mapping (Brychcı́n
corresponding corpora C1 and C2 and a list of et al., 2019; Artetxe et al., 2016; Artetxe et al.,
target words. Only these inputs may be used to 2017; Artetxe et al., 2018a; Artetxe et al., 2018b)
train systems, which judge for each target word, and thanks to the large similarity between L1 and
whether it is changed or not. The task is the same L2 the quality of transformation should be high.
as the binary sub-task of the SemEval-2020 Task We compute cosine similarity of the transformed
1 (Schlechtweg et al., 2020) competition. word vectors to classify whether the target words
changed their sense.
2 Data
3.2 Semantic Space Transformation
The DIACR-Ita data consists of many randomly
ordered text samples that have no relationship to First, we train two semantic spaces from corpus
each other. Most of the text samples are complete C1 and C2 . We represent the semantic spaces by a
sentences, but some are sentence fragments. matrix Xs (i.e., a source space s) and a matrix Xt
The ‘early’ corpus, C1 has about 2.4 million text (i.e., a target space t)2 using word2vec Skip-gram
samples and 52 million tokens; the ‘later’ corpus, with negative sampling (Mikolov et al., 2013). We
C2 has about 7.8 million text samples and 738 mil- perform a cross-lingual mapping of the two vec-
lion tokens. Each token is given in the corpora tor spaces, getting two matrices X̂s and X̂t pro-
with its part-of-speech tag and lemma. The tar- jected into a shared space. We select two meth-
get word list consists of 18 lemmas. The POS and ods for the cross-lingual mapping Canonical Cor-
lemmas of the corpora are generated with the UD- relation Analysis (CCA) using the implementation
Pipe (Straka, 2018) model ISDT-UD v2.5, which from (Brychcı́n et al., 2019) and a modification
has an error rate of about 2%. of the Orthogonal Transformation from VecMap
(Artetxe et al., 2018b). Both of these methods are
3 System Description linear transformations. The transformations can
be written as follows:
3.1 Overview
Because language is evolving, expressions, words, X̂s = Ws→t Xs (1)
and sentence constructions in two corpora from
different time periods about the same topic will where Ws→t is a matrix that performs linear
be written in languages that are quite similar but transformation from the source space s (matrix
slightly different. They will share the major- Xs ) into a target space t and X̂s is the source space
ity of their words, grammar, and syntax. We transformed into the target space t (the matrix Xt
can observe a similar situation in languages from does not have to be transformed because Xt is al-
the same family, such as Italian-Spanish in Ro- ready in the target space t and Xt = X̂t ).
mance languages or Czech-Slovak in Slavic lan- Finally, in all transformation methods, for each
guages. These pairs of languages share a lot of word wi from the set of target words T , we se-
lect its corresponding vectors vw s and vt from
common words, expressions and syntax. For some i wi
pairs, native speakers can understand and some- matrices X̂s and X̂t , respectively (vw s ∈ X̂s and
i
times even actively communicate through a (low) vwt ∈ X̂t ), and we compute cosine similarity be-
i
language barrier. tween these two vectors. The cosine similarity is
Our system follows the approach from (Pražák then used to generate a final classification output
et al., 2020)1 . The main idea behind our solution using different strategies, see Section 3.5 and 3.6.
is that we treat each pair of corpora C1 and C2
as different languages L1 and L2 even though the 3.3 Canonical Correlation Analysis
text from both corpora is written in Italian. We Generally, the CCA transformation transforms
believe that these two languages L1 and L2 will both spaces Xs and Xt into a third shared space
be extremely similar in all aspects, including se- o (where Xs 6= X̂s and Xt 6= X̂t ). Thus, CCA
mantic. We train a separate semantic space for computes two transformation matrices Ws→o for
each corpus, and subsequently, we map these two the source space and Wt→o for the target space.
spaces into one common cross-lingual space. We The transformation matrices are computed by
1
The source code is available at https://github. 2
The source space Xs is created from the corpus C1 and
com/pauli31/SemEval2020-task1 the target space Xt is created from the corpus C2 .
minimizing the negative correlation between the 3.5 Binary Strategy
vectors xsi ∈ Xs and xti ∈ Xt that are projected We use different strategies for the binary classifi-
into the shared space o. The negative correlation cation output, but all have in common that they use
is defined as follows: continuous scores. The continuous score for each
n
X target word is computed as the cosine similarity
argmin − ρ(Ws→o xsi , Wt→o xti ) = between the two vectors from the earlier and later
Ws→o ,Wt→o i=1
(2) corpus.
n
X cov(Ws→o xsi , Wt→o xti ) In the case of the binary strategy, we assume
− p
i=1
var(Ws→o xsi ) × var(Wt→o xti ) a threshold t for which the target words with a
continuous score greater than t changed meaning
where cov is the covariance, var is the variance and words with the score lower than t did not. We
and n is the number of vectors used for comput- know that this assumption is generally wrong (be-
ing the transformation. In our implementation of cause using the threshold, we introduce some error
CCA, the matrix X̂t is equal to the matrix Xt be- into the classification), but we still believe it holds
cause it transforms only the source space s (ma- for most cases and it is the best choice. To esti-
trix Xs ) into the target space t from the common mate the threshold t, we used an approach called
shared space with a pseudo-inversion, and the tar- binary-threshold (cca-bin and ort-bin in Table 1).
get space does not change. The matrix Ws→t for For each target word wi we compute cosine simi-
s and vt , then we average
larity of its vectors vw
this transformation is then given by: i wi
these similarities for all words. The resulting av-
eraged3 value is used as the threshold.
Ws→t = Ws→o (Wt→o )−1 (3)
3.6 Ranking Strategy
The submissions that use CCA are referred to as
cca-bin and cca-ranking in Table 1. The -bin and The ranking strategy is the second approach for
-ranking parts refer to a strategy used for the final generating a classification output (the submission
classification decision, see Section 3.5 and 3.6. result cca-ranking in Table 1). It uses the mean
rank of repeated runs of each embedding pair. For
3.4 Orthogonal Transformation each run, the target words are scored with a cosine
distance. Then the distances for each embedding
In the case of the Orthogonal Transformation, the pair are sorted and a rank-order is assigned to each
submission is referred to as ort-bin. We use Or- target. The rank-orders are averaged, to get a mean
thogonal Transformation with a supervised seed rank (and a standard deviation) for each target for
dictionary consisting of all words common to each pair. Finally, ranks for all embedding pairs
both semantic spaces. The transformation matrix are averaged. The composite rank is used, along
Ws→t is given by: with an estimate of the associated cosine distance
and its corresponding angle, to divide the target
|V |
X list into changed and unchanged sets. This does
argmin (Ws→t xsi − xti )2 (4) not work well; there are competing gaps in rank
Ws→t i
and distance estimates.
We use the number of embeddings, and not the
under the hard condition that Ws→t needs to be
total number of runs, to compute the standard error
orthogonal, where V is the vocabulary of correct
of the mean (which is standard deviation divided
word translations from source space Xs to target
by the square root of samples).
space Xt and xsi ∈ Xs and xti ∈ Xt . The rea-
son for the orthogonality constraint is that linear 4 Experimental Setup
transformation with an orthogonal matrix does not
squeeze or re-scale the transformed space. It only To obtain the semantic spaces, we employ Skip-
rotates the space, thus it preserves most of the re- gram with negative sampling (Mikolov et al.,
lationships of its elements (in our case, it is impor- 2013). For the final submission, we trained the se-
tant that orthogonal transformation preserves an- mantic spaces with 100 (the ort-bin submission)
gles between the words, so it preserves the cosine 3
The ort-bin submission sets the threshold to be in the
similarity). largest gap between the similarity values
and 150 (the cca-bin submission) dimensions for 5 Results
five iterations with five negative samples and win-
We submitted four different submissions. The ac-
dow size set to five. Each word has to appear at
curacy results for each submission are shown in
least five times in the corpus to be used in the train-
Table 1. The ort-bin system achieved the best
ing. To train the semantic space, we used the lem-
accuracy of 0.944 and ranked first4 among eight
matized corpora. The dimensions 100 and 150 are
other teams in the shared task, classifying 17 out
selected based on our previous experiences with
of 18 target words correctly. The cca-bin system
these methods (Pražák et al., 2020). Since we were
achieved an accuracy of 0.889 (16 correct classi-
able to submit four different submissions, we did
fications out of 18). After releasing the gold la-
not use the same dimension for both methods.
bels, we performed an additional experiment with
The cca-ranking submission uses the same set-
the cca-bin system achieving also an accuracy of
tings and dimensions 100-105, 110-115, etc. up
0.944 when the same word embeddings (with em-
to 210-215, resulting in 72 different dimension
beddings dimension 100 instead of 150) are used
sizes. It combines 40 runs on each of 72 embed-
as for the ort-bin system. We found an opti-
ding pairs, a total of 2880 runs.
mal threshold for both systems, which makes them
For the cca-bin submission, we build the trans-
classify all the words correctly5 .
lation dictionary for the transformation of the two
We believe that the key factor of the success of
spaces by removing the target words from the in-
our system is the sufficient size of the provided
tersection of their vocabularies. In the case of the
corpora. Thanks to that, we were able to train
cca-ranking submission, the dictionary in each
semantic spaces of good quality and thus achieve
run consists of up to 5000 randomly chosen com-
good results.
mon words for each semantic space.
The random submission represents output that System Accuracy
was generated completely randomly. cca-bin .889
ort-bin .944
cca-ranking .778
4.1 Corpus variants random .500
The organizers provided the corpora already tok-
enized in four different versions: original tokens; Table 1: Results for our final submissions.
lemmatized tokens; original tokens with POS tag;
lemmatized tokens with POS tag. We experi-
6 Conclusion
mented with each of these variants, although in the
end, we used results based only on lemmas. Figure Our systems based on Canonical Correlation
Analysis and Orthogonal Transformation achieved
4
rawTokens the best accuracy of 0.944 in the shared task and
lemmas
3.5 rawTokens+POS
lemmas+POS
ranked first among eight other teams. We showed
3
that our approach is a suitable solution for the Lex-
2.5 ical Semantic Change detection task. Applying a
2 threshold to semantic distance is a sensible archi-
1.5 tecture for detecting the binary semantic change
1 in target words between two corpora. Our binary-
0.5 threshold strategy succeeded quite well.
0 This task provided plenty of text to build good
100 120 140 160 180 200 220
word embeddings. Corpora with much smaller
amounts of data might have increased the ran-
Figure 1: Standard deviation (of rank) versus em- dom variation between the earlier and later embed-
bedding size for four versions of the corpora. dings, which would have given our method prob-
lems. A flaw in our technique is that semantic vec-
4
1 shows the mean standard deviation of rank for We share the first place with another team that achieved
target words over forty runs for each of 72 differ- the same accuracy.
5
That is, 100% accuracy was possible with the continuous
ent embedding sizes. The most consistent variant scores of both methods if we only had an oracle to set the
is the lemmas only. threshold.
tors are based on all senses of a word in the corpus. method for fully unsupervised cross-lingual map-
We do not yet have tools to tease out what kinds of pings of word embeddings. In Proceedings of the
56th Annual Meeting of the Association for Compu-
changes are implied by a particular semantic dis-
tational Linguistics (Volume 1: Long Papers), pages
tance between vectors. We considered using the 789–798, Melbourne, Australia, July. Association
part of speech data in the corpora since different for Computational Linguistics.
parts of speech for the same lemma are likely dif-
[Bamman and Crane2011] David Bamman and Gre-
ferent senses. But placing the POS in the token, gory Crane. 2011. Measuring historical word sense
like using inflections instead of lemmas, results in variation. In Proceedings of the 11th Annual Inter-
many more, less well-trained semantic vectors, as national ACM/IEEE Joint Conference on Digital Li-
suggested by Figure 1. braries, JCDL ’11, page 1–10, New York, NY, USA.
Association for Computing Machinery.
Acknowledgements [Basile et al.2020a] Pierpaolo Basile, Annalina Caputo,
Tommaso Caselli, Pierluigi Cassotti, and Rossella
This work has been partly supported by ERDF
Varvara. 2020a. DIACR-Ita @ EVALITA2020:
”Research and Development of Intelligent Overview of the EVALITA2020 Diachronic Lexi-
Components of Advanced Technologies for cal Semantics (DIACR-Ita) Task. In Valerio Basile,
the Pilsen Metropolitan Area (InteCom)” (no.: Danilo Croce, Maria Di Maro, and Lucia C. Passaro,
CZ.02.1.01/0.0/0.0/17 048/0007267); by the editors, Proceedings of the 7th evaluation campaign
of Natural Language Processing and Speech tools
project LO1506 of the Czech Ministry of Ed- for Italian (EVALITA 2020), Online. CEUR.org.
ucation, Youth and Sports; and by Grant No.
SGS-2019-018 Processing of heterogeneous [Basile et al.2020b] Valerio Basile, Danilo Croce,
Maria Di Maro, and Lucia C. Passaro. 2020b.
data and its specialized applications. Access Evalita 2020: Overview of the 7th evaluation cam-
to computing and storage facilities owned by paign of natural language processing and speech
parties and projects contributing to the National tools for italian. In Valerio Basile, Danilo Croce,
Grid Infrastructure MetaCentrum provided under Maria Di Maro, and Lucia C. Passaro, editors,
the programme ”Projects of Large Research, Proceedings of Seventh Evaluation Campaign of
Natural Language Processing and Speech Tools for
Development, and Innovations Infrastructures” Italian. Final Workshop (EVALITA 2020), Online.
(CESNET LM2015042), is greatly appreciated. CEUR.org.
[Brychcı́n et al.2019] Tomáš Brychcı́n, Stephen Taylor,
References and Lukáš Svoboda. 2019. Cross-lingual word
analogies using linear transformations between se-
[Artetxe et al.2016] Mikel Artetxe, Gorka Labaka, and mantic spaces. Expert Systems with Applications,
Eneko Agirre. 2016. Learning principled bilin- 135:287–295.
gual mappings of word embeddings while preserv-
ing monolingual invariance. In Proceedings of the [Cook et al.2014] Paul Cook, Jey Han Lau, Diana Mc-
2016 Conference on Empirical Methods in Natu- Carthy, and Timothy Baldwin. 2014. Novel word-
ral Language Processing, pages 2289–2294, Austin, sense identification. In Proceedings of COLING
Texas, November. Association for Computational 2014, the 25th International Conference on Compu-
Linguistics. tational Linguistics: Technical Papers, pages 1624–
1635, Dublin, Ireland, August. Dublin City Univer-
[Artetxe et al.2017] Mikel Artetxe, Gorka Labaka, and sity and Association for Computational Linguistics.
Eneko Agirre. 2017. Learning bilingual word em-
beddings with (almost) no bilingual data. In Pro- [Eger and Mehler2016] Steffen Eger and Alexander
ceedings of the 55th Annual Meeting of the Associa- Mehler. 2016. On the linearity of semantic change:
tion for Computational Linguistics (Volume 1: Long Investigating meaning variation via dynamic graph
Papers), pages 451–462, Vancouver, Canada, July. models. In Proceedings of the 54th Annual Meet-
Association for Computational Linguistics. ing of the Association for Computational Linguistics
(Volume 2: Short Papers), pages 52–58, Berlin, Ger-
[Artetxe et al.2018a] Mikel Artetxe, Gorka Labaka, , many, August. Association for Computational Lin-
and Eneko Agirre. 2018a. Generalizing and im- guistics.
proving bilingual word embedding mappings with a
multi-step framework of linear transformations. In [Frermann and Lapata2016] Lea Frermann and Mirella
Proceedings of the Thirty-Second AAAI Conference Lapata. 2016. A Bayesian model of diachronic
on Artificial Intelligence (AAAI-18), pages 5012– meaning change. Transactions of the Association
5019. for Computational Linguistics, 4:31–45.
[Artetxe et al.2018b] Mikel Artetxe, Gorka Labaka, and [Gulordava and Baroni2011] Kristina Gulordava and
Eneko Agirre. 2018b. A robust self-learning Marco Baroni. 2011. A distributional similarity
approach to the detection of semantic change in the Volume 1 (Long Papers), pages 474–484, New Or-
Google books ngram corpus. In Proceedings of the leans, Louisiana, June. Association for Computa-
GEMS 2011 Workshop on GEometrical Models of tional Linguistics.
Natural Language Semantics, pages 67–71, Edin-
burgh, UK, July. Association for Computational Lin- [Schlechtweg and Walde2020] Dominik Schlechtweg
guistics. and Sabine Schulte im Walde. 2020. Simulating
lexical semantic change from sense-annotated
[Hamilton et al.2016a] William L. Hamilton, Jure data. In A. Ravignani, C. Barbieri, M. Martins,
Leskovec, and Dan Jurafsky. 2016a. Cultural shift M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little,
or linguistic drift? comparing two computational K. Mudd, and T. Verhoef, editors, The Evolution of
measures of semantic change. In Proceedings of the Language: Proceedings of the 13th International
2016 Conference on Empirical Methods in Natural Conference (EvoLang13).
Language Processing, pages 2116–2121, Austin,
Texas, November. Association for Computational [Schlechtweg et al.2017] Dominik Schlechtweg, Ste-
Linguistics. fanie Eckmann, Enrico Santus, Sabine Schulte im
Walde, and Daniel Hole. 2017. German in flux: De-
[Hamilton et al.2016b] William L. Hamilton, Jure tecting metaphoric change via word entropy. In Pro-
Leskovec, and Dan Jurafsky. 2016b. Diachronic ceedings of the 21st Conference on Computational
word embeddings reveal statistical laws of semantic Natural Language Learning (CoNLL 2017), pages
change. In Proceedings of the 54th Annual Meeting 354–367, Vancouver, Canada, August. Association
of the Association for Computational Linguistics for Computational Linguistics.
(Volume 1: Long Papers), pages 1489–1501, Berlin,
[Schlechtweg et al.2018] Dominik Schlechtweg,
Germany, August. Association for Computational
Sabine Schulte im Wlade, and Stefanie Eckmann.
Linguistics.
2018. Diachronic usage relatedness (durel): A
[Kutuzov et al.2018] Andrey Kutuzov, Lilja Øvrelid, framework for the annotation of lexical semantic
Terrence Szymanski, and Erik Velldal. 2018. Di- change. In Proceedings of NAACL-HLT 2018,
achronic word embeddings and semantic shifts: a pages 169–174.
survey. In Proceedings of the 27th International
[Schlechtweg et al.2019] Dominik Schlechtweg, Anna
Conference on Computational Linguistics, pages
Hätty, Marco Del Tredici, and Sabine Schulte im
1384–1397, Santa Fe, New Mexico, USA, August.
Walde. 2019. A wind of change: Detecting and
Association for Computational Linguistics.
evaluating lexical semantic change across times and
[Mihalcea and Nastase2012] Rada Mihalcea and Vivi domains. In Proceedings of the 57th Annual Meet-
Nastase. 2012. Word epoch disambiguation: Find- ing of the Association for Computational Linguis-
ing how words change over time. In Proceedings of tics, pages 732–746, Florence, Italy, July. Associa-
the 50th Annual Meeting of the Association for Com- tion for Computational Linguistics.
putational Linguistics (Volume 2: Short Papers), [Schlechtweg et al.2020] Dominik Schlechtweg, Bar-
pages 259–263, Jeju Island, Korea, July. Association bara McGillivray, Simon Hengchen, Haim Du-
for Computational Linguistics. bossarsky, and Nina Tahmasebi. 2020. Se-
[Mikolov et al.2013] Tomas Mikolov, Kai Chen, Greg mEval 2020 Task 1: Unsupervised Lexical Se-
Corrado, and Jeffrey Dean. 2013. Efficient estima- mantic Change Detection. In Proceedings of the
tion of word representations in vector space. In Pro- 14th International Workshop on Semantic Evalua-
ceedings of workshop at ICLR. arXiv1301.3781. tion (SemEval-2020), Barcelona, Spain, Sep. Asso-
ciation for Computational Linguistics.
[Mitra et al.2015] Sunny Mitra, Ritwik Mitra,
Suman Kalyan Maity, Martin Riedl, Chris Bie- [Straka2018] Milan Straka. 2018. UDPipe 2.0 proto-
mann, Pawan Goyal, and Animesh Mukherjee. type at CoNLL 2018 UD shared task. In Proceed-
2015. An automatic approach to identify word ings of the CoNLL 2018 Shared Task: Multilingual
sense changes in text media across timescales. Parsing from Raw Text to Universal Dependencies,
Natural Language Engineering, 21(5):773–798. pages 197–207, Brussels, Belgium, October. Asso-
ciation for Computational Linguistics.
[Pražák et al.2020] Ondřej Pražák, Pavel Přibáň,
Stephen Taylor, and Jakub Sido. 2020. Uwb at [Tahmasebi and Risse2017] Nina Tahmasebi and
semeval-2020 task 1: Lexical semantic change Thomas Risse. 2017. Finding individual word
detection. In Proceedings of the 14th International sense changes and their delay in appearance. In
Workshop on Semantic Evaluation (SemEval- Proceedings of the International Conference Recent
2020), Barcelona, Spain, Sep. Association for Advances in Natural Language Processing, RANLP
Computational Linguistics. 2017, pages 741–749, Varna, Bulgaria, September.
INCOMA Ltd.
[Rosenfeld and Erk2018] Alex Rosenfeld and Katrin
[Tahmasebi et al.2018] Nina Tahmasebi, Lars Borin,
Erk. 2018. Deep neural models of semantic shift.
and Adam Jatowt. 2018. Survey of computa-
In Proceedings of the 2018 Conference of the North
tional approaches to lexical semantic change. arXiv
American Chapter of the Association for Computa-
preprint arXiv:1811.06278.
tional Linguistics: Human Language Technologies,