<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QMUL-SDS @ DIACR-Ita: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rabab Alkhalifa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Tsakalidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkaitz Zubiaga</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Liakata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alan Turing Institute</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imam Abdulrahman bin Faisal University</institution>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Queen Mary University of London</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present the results and main findings of our system for the DIACR-Ita 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effective compared to different approaches including Logistic Regression and a Feed Forward Neural Network using accuracy. Our model ranked 3rd with an accuracy of 83.3%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The quantitative analysis of language
evolution over time is a new emerging research area
within the domain of Natural Language
Processing
        <xref ref-type="bibr" rid="ref12 ref25 ref8 ref9">(Turney and Pantel, 2010; Hamilton et al.,
2016; Dubossarsky et al., 2017)</xref>
        . The study of
Diachronic Lexical Semantics
        <xref ref-type="bibr" rid="ref14 ref21">(Tahmasebi et al.,
2018; Kutuzov et al., 2018)</xref>
        , which contributes
towards detecting word-level language
evolution, brings together researchers with broadly
varying backgrounds from computational
linguistics, cognitive science, statistics,
mathematics, and historical linguistics, since the
identification of words whose lexical semantics have
changed over time has numerous downstream
applications in various domains such as
historical linguistics and NLP. Despite the increase
in research interest, few tasks that track word
meaning change over time have focused on
nonEnglish languages, while the comparison of
dif
      </p>
      <p>
        Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
ferent approaches in the same experimental and
evaluation setting is still limited
        <xref ref-type="bibr" rid="ref18">(Schlechtweg et
al., 2020)</xref>
        . The DIACR-Ita 2020 Task
        <xref ref-type="bibr" rid="ref3 ref3 ref4 ref4">(Basile et al.,
2020a; Basile et al., 2020b)</xref>
        aims to fill these gaps
by focusing on the Italian language used during
two different time periods and providing a single
evaluation framework to researchers for testing
their methods.
      </p>
      <p>
        This work presents our approach towards
detecting Italian words with altered lexical
semantics during the two distinct time periods studied
in the DIACR-Ita 2020 Shared Task. Our
contribution focuses on evaluating findings from
previous studies, exploring evaluation approaches
for different methods and comparing their
performance. We contrast several variants of
training-testing words with different alignment
approaches across two word embedding
models, namely Skip-gram and Continuous
Bag-ofWords
        <xref ref-type="bibr" rid="ref15">(Mikolov et al., 2013)</xref>
        . Our submission
consisted of four models that showed the best
average cosine similarity, calculated on the basis
of their ability to accurately reconstruct the
representations of Italian stop-words across the two
periods of time under study. Our best
performing model uses a Continuous Bag-of-Words
temporal compass model, adapted from the model
introduced by
        <xref ref-type="bibr" rid="ref6">(Carlo et al., 2019)</xref>
        . Our system
ranked third in the task.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        Work related to unsupervised diachronic lexical
semantics detection can be divided into
different approaches depending on the type of word
representations used in a diachronic model
(e.g., based on graphs or probability
distributions
        <xref ref-type="bibr" rid="ref1 ref10 ref12">(Frermann and Lapata, 2016; Azarbonyad
et al., 2017)</xref>
        , temporal dimensions
        <xref ref-type="bibr" rid="ref14 ref2 ref21">(Basile and
McGillivray, 2018)</xref>
        , frequencies or co-occurrence
matrices
        <xref ref-type="bibr" rid="ref16 ref25 ref8">(Sagi et al., 2009; Cook and Stevenson,
2010)</xref>
        , neural- or Transformer-based
        <xref ref-type="bibr" rid="ref11 ref12 ref17 ref20 ref5">(Hamilton
et al., 2016; Boleda et al., 2019; Shoemark et al.,
2019; Schlechtweg et al., 2019; Giulianelli et al.,
2020)</xref>
        , etc.). In our work, we focus on dense
word representations
        <xref ref-type="bibr" rid="ref15">(Mikolov et al., 2013)</xref>
        , due
to their high effectiveness that has been
demonstrated in prior work.
      </p>
      <p>
        Systems operating on representations such as
those derived from Skip-gram or Continuous
Bag-of-Words leverage in most cases
deterministic approaches using mathematical matrix
transformations
        <xref ref-type="bibr" rid="ref1 ref12 ref24">(Hamilton et al., 2016; Azarbonyad et
al., 2017; Tsakalidis et al., 2019)</xref>
        , such as
Orthogonal Procrustes (Schönemann, 1966), or
machine learning models
        <xref ref-type="bibr" rid="ref11 ref23">(Tsakalidis and Liakata,
2020)</xref>
        . The goal of these approaches is to learn
a mapping between the word vectors that have
been trained independently by leveraging
textual information from two or more different
periods of time. The common standard for
measuring the level of diachronic semantic change of a
word under this setting is to use a similarity
measure (e.g., cosine distance) on the aligned space
– i.e., after the mapping step is complete
        <xref ref-type="bibr" rid="ref25 ref8">(Turney
and Pantel, 2010)</xref>
        .
      </p>
      <p>
        <xref ref-type="bibr" rid="ref9">(Dubossarsky et al., 2017)</xref>
        argue that using
cosine distance introduces bias in the system
triggered by word frequency variations.
        <xref ref-type="bibr" rid="ref22">(Tan et
al., 2015)</xref>
        only use the vectors of the top
frequent terms to find the transformation matrix,
and then they calculate the similarity for the
remaining terms after applying the
transformation to the source matrix. Incremental update
        <xref ref-type="bibr" rid="ref13 ref5">(Kim et al., 2014; Boleda et al., 2019)</xref>
        used the
intersection of words between datasets in each
time frame by initializing the word embedding
from the previous time slice to compare the word
shift cross different years instead of using
matrix transformation. Temporal Word
Embeddings with a Compass (TWEC)
        <xref ref-type="bibr" rid="ref6">(Carlo et al., 2019)</xref>
        approach uses an approach of freezing selected
vectors based on model’s architecture, it learn a
parallel embedding for all time periods from a
base embedding frozen vectors.
      </p>
      <p>Our approaches, detailed in Section 4,
follow and compare different methodologies from
prior work based on (a) Orthogonal Procrustes
alignment, (b) machine learning models and (c)
aligned word embeddings across different time
periods.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Task Description</title>
      <p>
        The task was introduced by
        <xref ref-type="bibr" rid="ref7">(Cignarella et al.,
2020)</xref>
        and is defined as follows:
      </p>
      <p>Given two diachronic textual data, an
unsupervised diachronic lexical semantics
classifier should be able to find the
optimal mapping to compare the diachronic
textual data and classify a set of test
words to one of two classes: 0 for
stable words and 1 for words whose meaning
has shifted.</p>
      <p>We were provided with the two corpora in the
Italian language, each from a different time
period, and we developed several methods in order
to classify a word in the given test set as
“semantically shifted” or “stable” across the two time
periods. The test set included 18 observed words –
12 stable and 6 semantically shifted examples.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Our Approach</title>
      <p>Here we outline our approaches for detecting
words whose lexical semantics have changed.</p>
    </sec>
    <sec id="sec-5">
      <title>4.1 Generating Word Vectors</title>
      <p>
        Word representations Wi at the period Ti were
generated in two ways:
(a) IND: via Continuous Bag of Words (CBOW)
and Skip-gram (SG)
        <xref ref-type="bibr" rid="ref15">(Mikolov et al., 2013)</xref>
        applied
to each year independently;
(b) CMPS: via the Temporal Word Embeddings
with a Compass (TWEC) approach
        <xref ref-type="bibr" rid="ref6">(Carlo et al.,
2019)</xref>
        , where a single model (CBOW or SG) is
first trained over the merged corpus; then, SG (or
CBOW) is applied on the representations of each
year independently, by initialising and freezing
the weights of the model based on the output of
the first base model pass and learning only the
contextual part of the representations for that
year.
      </p>
      <p>In both cases, we used gensim with default
settings.1 Sentences were tokenised using the
simple split function for flattened sentences
provided by the organisers, without any further
pre-processing. Although there are many
approaches to generate word representations (e.g.,
using syntactic rules), we focused on 1-gram
rep</p>
      <sec id="sec-5-1">
        <title>1https://radimrehurek.com/gensim/</title>
        <p>resentations using CBOW and SG, without
considering words lemmas and Part-of-Speech tags.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4.2 Measuring Semantic Change</title>
      <p>We employ the cosine similarity for measuring
the level of semantic change of a word. Given
two word vectors wT 0, wT 1, semantic change
between them is defined as follows:</p>
      <p>
        wT0 ¢wT1
cos(wT0,wT1) Æ kwT0kkwT1k Æ r
Though alternative methods have been
introduced in the literature (e.g., neighboring by
pivoting the top five similar words
        <xref ref-type="bibr" rid="ref1">(Azarbonyad
et al., 2017)</xref>
        ), we opted for the similarity
metric which is most widely used in related work
        <xref ref-type="bibr" rid="ref12 ref20 ref24">(Hamilton et al., 2016; Shoemark et al., 2019;
Tsakalidis et al., 2019)</xref>
        .
      </p>
    </sec>
    <sec id="sec-7">
      <title>4.3 Evaluation Sets</title>
      <p>The challenge is expecting the lexical change
detection to be done in an unsupervised fashion
(i.e., no word labels have been provided). Thus,
we considered stop words2 (SW ) and all of the
other common words (CW ) in T0 and T1 as our
training and evaluation sets interchangeably.</p>
    </sec>
    <sec id="sec-8">
      <title>4.4 Semantic Change Detection Methods</title>
      <p>
        We employed the following approaches for
detecting words whose lexical semantics have
changed:
(a) Orthogonal Procrustes (OP): Due to the
stochastic nature of CBOW/SG, the resulting
word vectors W0 and W1 in IND were not
aligned. Orthogonal Procrustes
        <xref ref-type="bibr" rid="ref12">(Hamilton et
al., 2016)</xref>
        tackles this issue by aligning W1 based
on W0. The level of semantic shift of a word is
calculated by measuring the cosine similarity
between the aligned vectors. For evaluation
purposes, we measured the cosine similarity
of the stop words between the two aligned
matrices. Higher values indicate a better model
(i.e., stop words retain their meaning over time).
(b) Feed-Forward Neural Network (FFNN): We
trained a FFNN that leverages IND to predict
W1 based on W0. The level of semantic shift of
a word in a test set is calculated by measuring
the cosine similarity between the predicted W ¤
1
and W1. For evaluation purposes, we measure
2https://github.com/stopwords-iso/
stopwords-it
the cosine similarity between the actual and
predicted representations of words in T1. Higher
values for stop-words indicate a better model.
(c) Linear Regression (LR): We employed an
ordinary linear mapping with least square error
objective function.3 The task and the evaluation
setting was identical to FFNN.
(d) Temporal Word Embeddings with a Compass
(TWEC)
        <xref ref-type="bibr" rid="ref6">(Carlo et al., 2019)</xref>
        : Working on the
CMPS vectors, the level of semantic shift of
a word is calculated by measuring the cosine
similarity between T0 and T1 directly.
      </p>
      <p>Notation In the rest of this paper, we denote
a model M trained on CW (SW) as M_CW
(M_SW ). For the case of OP , the training
process involves learning an alignment based on a
specific word set (CW or SW ). Note that this
notation does not apply for T W EC , since the word
vectors in the two time periods can be directly
compared against each other – thus the level of
semantic change can be calculated directly (i.e.,
there is no need to learn any mapping between
W0 and W1). Finally, we add a subscript CBOW or
SG to our models, denoting the type of algorithm
that was used for generating the respective
embeddings that are fed to our model.</p>
      <p>Model Selection We select to apply the
models on the test set providing high average cosine
similarity with stop words.</p>
    </sec>
    <sec id="sec-9">
      <title>4.5 Word Classification</title>
      <p>
        As per the task guidelines
        <xref ref-type="bibr" rid="ref7">(Cignarella et al.,
2020)</xref>
        , words can fall into one of the two
categories: 0: the target word does not change
meaning between T0 and T1 and 1: the target word
changes its meaning between T0 and T1. For
all of our submitted models, we considered all
the words with cosine similarity below the mean
as shifted words and labelled them with 1. We
further investigate the model’s ability to detect
words laying two standard deviations below the
mean (¹¡2¾), a.k.a variance. Interestingly, some
of the models including LR and FFNN_CWCBOW
showed an increase in accuracy.
      </p>
    </sec>
    <sec id="sec-10">
      <title>5 Results</title>
      <p>The results are shown in Table 1, where
we split our results based on model #M
ar</p>
      <sec id="sec-10-1">
        <title>3https://scikit-learn.org/stable/</title>
        <p>train. M CSSaWvg
SW OP 0.748</p>
        <p>LR 0.854</p>
        <p>FFNN 0.769
CW OP 0.464</p>
        <p>LR 0.409
FFNN 0.658
TWEC 0.722
Accuracy
%¹
0.778
0.333
0.333
0.389
0.333
0.333
0.722
chitecture, SG and CBOW and model’s
training word sets, Stop-Words (SW) and
CommonWords (CW). For models based on linear
transformation, our top performing models scored
below average cosine similarity, TWECCBOW
(0.833), OP_SWSG (0.778), OP_SWCBOW (0.778),
TWECSG (0.722). As shown in Figure 5, we
observe that these models tend to have skewed
distributions for stop words, where the vast
majority of stop words are assigned high cosine
similarity scores. However, other models did not
show this skewness, e.g. OP_CWSG (0.389) and
OP_CWCBOW (0.611). When labeling the change
based on variance (¹ ¡ 2¾), as in outlier
detection, some models showed an increase from
the dummy classifier’s performance. For
instance, OP_CWsg showed an increase on
performance from (0.389) to (0.778) showing that those
with low average cosine similarity lay out in the
tail from majority similarity. Similarly, models
based on reducing the similarity error between
the predicted and actual vectors, e.g. LR and
FFNN considering the outlier detection
methodology, tend to achieve better performance,
including LR_SWCBOW , FFNN_SWCBOW and
FFNN_CWCBOW where LR_SWCBOW showed an
increase from frequency classifier’s baseline
(0.500) to (0.778), and LR_SWCBOW showed an
increase from dummy classifier performance
(0.333) to (0.722).</p>
        <p>Ranking methods, average ranking (¹r ank )
and Recall (R), expect prior knowledge about the
evaluation labels to make them useful for
evaluating the reliability of the model of interest.
For that, we further investigate the reliability of
our experiment models, using ¹_r ank and R at
%50 (Rp50) and %30 (R#6). Although using (Rp50)
signal OP_SWSG , OP_SWCBOW , FFNN_CWSG ,
TWECCBOW as equalliy good, ¹r ank ranked top
models as OP_SWSG , OP_SWCBOW , LR_SWCBOW
then TWECCBOW with (0.222, 0.270, 0.278 and
0.286), respectively. Additionally, under extreme
conditions, OP_SWCBOW ranked better than all
including TWECCBOW . This shows that under
extreme conditions, a good method is the one
which keeps providing out of distribution signals
to changing words and that needs to take a
careful consideration about the distribution of the
words before and after the alignments as in OP.
In general, CBoW-based models showed better
performance than SG-based models with
average accuracy of (%¹ 0.564 and %¹ ¡ 2¾ 0.667)
compared to (%¹ 0.460 and ¹ ¡ 2¾ 0.524) for
words labelled by mean and variance,
respectively. Further, alignment using non-changing
words (e.g. stop-words) yields higher
performance than using all common words with
average cosine similarity for stop words as (C SSaWvg
0.777) compared to (C SSaWvg 0.431), which is
expected because SW-based models learns the
optimal mapping with less noise than CW-based
models.
Our work provides a comprehensive analysis for
Italian lexical diachronic methods introduced
from previous work. For models that are based
on matrix linear transformation including TWEC
and OP, we find a relation between high average
stop words similarity and accuracy. Further,
CBOW tends to achieve better results than the SG
architecture for most experiments. Visually, we
find that a visibly skewed distribution showing
the tendency of stop words to have high cosine
similarity scores leads to effective means for
capturing semantic shift. We also showed that by
evaluating the models using different methods,
TWECCBOW achieved top performance.
Followed by OP_SW and OP_CWSG , and LR using
outlier detection methodology. Further, FFNN
showed high recall (Rp50) by ranking changed
words with lowest cosine similarity on testing set
similar to OP_SW and TWECCBOW . This
provides promising insights encouraging further
investigation of neural network models using
different languages and larger datasets.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>7 Conclusions</title>
      <p>In this report, we describe and compare our
models submitted to the DIACR-Ita 2020 shared
task, which assessed the ability to classify
semantic-shift of words in Italian. We show
that the TWEC model yields better performance
than Orthogonal Procrustes, labelling all words
scored below average cosine similarity as
semantically shifted words, i.e. words with altered
semantics over the two time periods.
Additionally, we showed that using an outlier detection
methodology yields better results in
predictionbased models such as Linear Regression and
Feed-Forward Neural Network, boosting the
performance significantly compared to the
baselines and dummy classifier.</p>
      <p>In the future we aim to focus on fine tuning
SoTa pre-trained language models such as ELMo
and BERT for word level semantics-shift
detection as well as investigating the ability of
dynamic graph models on capturing word
evolution.</p>
    </sec>
    <sec id="sec-12">
      <title>8 Acknowledgments</title>
      <p>This research utilised Queen Mary’s Apocrita
HPC facility, supported by QMUL Research-IT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Hosein</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          , Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, and
          <string-name>
            <given-names>Jaap</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Words are malleable: Computing semantic shifts in political and media discourse</article-title>
          .
          <source>International Conference on Information and Knowledge Management, Proceedings, Part F1318(3)</source>
          :
          <fpage>1509</fpage>
          -
          <lpage>1518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile and Barbara McGillivray</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Exploiting the web for semantic change detection</article-title>
          .
          <source>In International Conference on Discovery Science</source>
          , pages
          <fpage>194</fpage>
          -
          <lpage>208</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, and
          <string-name>
            <given-names>Rossella</given-names>
            <surname>Varvara</surname>
          </string-name>
          .
          <year>2020a</year>
          .
          <article-title>DIACR-Ita @ EVALITA2020: Overview of the EVALITA 2020 Diachronic Lexical Semantics (DIACR-Ita) Task</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          . 2020b.
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Gemma</given-names>
            <surname>Boleda</surname>
          </string-name>
          ,
          <source>Marco Del Tredici, and Raquel Fernández</source>
          .
          <year>2019</year>
          .
          <article-title>Short-term meaning shift: a distributional exploration</article-title>
          .
          <source>Proceedings of the</source>
          <year>2019</year>
          ;
          <article-title>2019 Jun 2-7; Minneapolis, United States of America. Stroudsburg (PA): ACL;</article-title>
          <year>2019</year>
          . p.
          <fpage>2069</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Carlo</surname>
          </string-name>
          , Federico Bianchi, and
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Palmonari</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Training temporal word embeddings with a compass</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1906</year>
          .02376.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Mirko Lai, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Overview of the EVALITA 2020 Task on Stance Detection in Italian Tweets (SardiStance)</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Cook</surname>
          </string-name>
          and
          <string-name>
            <given-names>Suzanne</given-names>
            <surname>Stevenson</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Automatically Identifying Changes in the Semantic Orientation of Words</article-title>
          .
          <source>In Proceedings of the Seventh conference on International Language Resources and Evaluation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Haim</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          , Daphna Weinshall, and
          <string-name>
            <given-names>Eitan</given-names>
            <surname>Grossman</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Outta control: Laws of semantic change and inherent biases in word representation models</article-title>
          .
          <source>In Proceedings of the 2017 conference on empirical methods in natural language processing</source>
          , pages
          <fpage>1136</fpage>
          -
          <lpage>1145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Lea</given-names>
            <surname>Frermann</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mirella</given-names>
            <surname>Lapata</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>A bayesian model of diachronic meaning change</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>4</volume>
          :
          <fpage>31</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Mario</given-names>
            <surname>Giulianelli</surname>
          </string-name>
          ,
          <source>Marco Del Tredici, and Raquel Fernández</source>
          .
          <year>2020</year>
          .
          <article-title>Analysing Lexical Semantic Change with Contextualised Word Representations</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , Online, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>William L Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jure Leskovec</surname>
            , and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change</article-title>
          .
          <source>arXiv preprint arXiv:1605</source>
          .
          <fpage>09096</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yi-I Chiu</surname>
            , Kentaro Hanaki, Darshan Hegde, and
            <given-names>Slav</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Temporal analysis of language through neural language models</article-title>
          .
          <source>In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          , Lilja Øvrelid, Terrence Szymanski, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Velldal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic word embeddings and semantic shifts: a survey</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1384</fpage>
          -
          <lpage>1397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Eyal</given-names>
            <surname>Sagi</surname>
          </string-name>
          , Stefan Kaufmann, and
          <string-name>
            <given-names>Brady</given-names>
            <surname>Clark</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space</article-title>
          .
          <source>In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics</source>
          , pages
          <fpage>104</fpage>
          -
          <lpage>111</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Anna Hätty,
          <source>Marco Del Tredici, and Sabine Schulte im Walde</source>
          .
          <year>2019</year>
          .
          <article-title>A wind of change: Detecting and evaluating lexical semantic change across times and domains</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>732</fpage>
          -
          <lpage>746</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
            <given-names>McGillivray</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Simon</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , Haim Dubossarsky, and
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Semeval-2020 task 1: Unsupervised lexical semantic change detection</article-title>
          . arXiv preprint arXiv:
          <year>2007</year>
          .11464.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Peter H Schönemann</surname>
          </string-name>
          .
          <year>1966</year>
          .
          <article-title>A Generalized Solution of the Orthogonal Procrustes Problem</article-title>
          . Psychometrika,
          <volume>31</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Philippa</given-names>
            <surname>Shoemark</surname>
          </string-name>
          , Farhana Ferdousi Liza,
          <string-name>
            <given-names>Dong</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Scott</given-names>
            <surname>Hale</surname>
          </string-name>
          , and
          <string-name>
            <surname>Barbara McGillivray</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Room to glo: A systematic comparison of semantic change detection approaches with word embeddings</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , Lars Borin, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Survey of computational approaches to lexical semantic change</article-title>
          . arXiv preprint arXiv:
          <year>1811</year>
          .06278.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Luchen</given-names>
            <surname>Tan</surname>
          </string-name>
          , Haotian Zhang, Charles Clarke, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Smucker</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Lexical comparison between wikipedia and twitter corpora by using word embeddings</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)</source>
          , pages
          <fpage>657</fpage>
          -
          <lpage>661</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Adam</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Liakata</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Autoencoding word representations through time for semantic change detection</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .13703.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Adam</given-names>
            <surname>Tsakalidis</surname>
          </string-name>
          , Marya Bazzi, Mihai Cucuringu, Pierpaolo Basile, and
          <string-name>
            <surname>Barbara McGillivray</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Mining the UK web archive for semantic change detection</article-title>
          .
          <source>In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP</source>
          <year>2019</year>
          ), pages
          <fpage>1212</fpage>
          -
          <lpage>1221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Peter D Turney</surname>
            and
            <given-names>Patrick</given-names>
          </string-name>
          <string-name>
            <surname>Pantel</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>From frequency to meaning: Vector space models of semantics</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>37</volume>
          :
          <fpage>141</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>