<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <email>pierpaolo.basile@uniba.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annalina Caputo</string-name>
          <email>annalina.caputo@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Caselli</string-name>
          <email>t.caselli@rug.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierluigi Cassotti</string-name>
          <email>pierluigi.cassotti@uniba.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rossella Varvara</string-name>
          <email>rossella.varvara@unifi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, School of Computing, Dublin City University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CLCG, University of Groningen</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>DILEF, University of Florence</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Dept. of Computer Science, University of Bari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes the first edition of the “Diachronic Lexical Semantics” (DIACR-Ita) task at the EVALITA 2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given contextual information from corpora. The task, at its first edition, attracted 9 participant teams and collected a total of 36 submission runs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The Diachronic Lexical Semantics (DIACR-Ita)
task focuses on the automatic recognition of
lexical semantic change over time, combining
together computational and historical linguistics.
The aim of the task can be shortly described as
follows: given contextual information from corpora,
systems are challenged to detect if a given word
has changed its meaning over time.</p>
      <p>
        Word meanings can evolve in different ways.
They can undergo pejoration or amelioration
(when meanings become respectively more
negative or more positive) or they can be object of
broadening (also referred to as generalization or
extension) or narrowing (also known as
restriction or specialization). For instance, the
English word dog is a clear case of broadening,
since its more general meaning came from the
late Old English “dog of a powerful breed”
        <xref ref-type="bibr" rid="ref27">(Traugott, 2006)</xref>
        . On the contrary, the Old English
word deor with the general meaning of “animal”
became deer in present-day English. Semantic
changes can be further classified on the basis of the
cognitive process that originated them, i.e. either
from metonymy or metaphor. Lastly, it is
possible to distinguish among changes due to
languageinternal or language-external factors
        <xref ref-type="bibr" rid="ref14">(Hollmann,
2009)</xref>
        . The latter usually reflects a change in
society, as in the case of technological advancements
(e.g. cell, from the meaning of “prisoner cell” to
“cell phone”).
      </p>
      <p>The problem of the automatic analysis of
lexical semantic change is gaining momentum in the
Natural Language Processinng (NLP) and
Computational Linguistics (CL) communities, as shown
by the growing number of publications on the
diachronic analysis of language and the
organisation of related events such as the 1st International
Workshop on Computational Approaches to
Historical Language Change1 and the project
“Towards Computational Lexical Semantic Change
Detection”2. Following this trend, SemEval 2020
has hosted for the first time a task on automatic
recognition of lexical semantic change: the
SemEval 2020 Task 1 - Unsupervised Lexical
Semantic Change Detection3 (Schlechtweg et al.,
1https://languagechange.org/events/
2019-acl-lcworkshop/</p>
    </sec>
    <sec id="sec-2">
      <title>2https://languagechange.org/</title>
    </sec>
    <sec id="sec-3">
      <title>3https://competitions.codalab.org/</title>
      <p>competitions/20948
2020). While this task targets a number of
different languages, namely Swedish, Latin, and
German, Italian is not present.</p>
      <p>
        Many are the existing approaches, data sets,
and evaluation strategies used to detect semantic
change, or drift. Most of the approaches rely on
diachronic word embeddings, some of these are
created as post-processing of static word embeddings,
such as Hamilton et al. (2016); while others create
dynamic word embeddings where vectors share
the same space for all time periods
        <xref ref-type="bibr" rid="ref11 ref18 ref22 ref25 ref28 ref8">(Del Tredici
et al., 2016; Yao et al., 2018; Rudolph and Blei,
2018; Dubossarsky et al., 2019)</xref>
        . Recent work
exploits word sense induction algorithms to
discover semantic shifts
        <xref ref-type="bibr" rid="ref15 ref24">(Tahmasebi and Risse, 2017;
Hu et al., 2019)</xref>
        by analyzing how induced senses
change over time. Finally, Gonen et al. (2020)
propose a simple approach based on the neighbors’
intersection between two corpora. The
neighborhood of a word is separately computed in each
corpus, then the intersection is exploited to compute a
measure of the semantic shift. The neighborhood
in each corpus can be computed using the cosine
similarity between word embeddings built on the
same corpus without using vectors alignment. A
more complete state of the art is described in a
critical and concise way in the latest surveys
        <xref ref-type="bibr" rid="ref18 ref25 ref26">(Tahmasebi et al., 2018; Kutuzov et al., 2018; Tang,
2018)</xref>
        .
      </p>
      <p>
        Almost all of the previously mentioned
methods use English as the target language for the
diachronic analysis, leaving the other languages still
under-explored. To date, only one evaluation has
been carried out on Italian using the Kronos-it
dataset
        <xref ref-type="bibr" rid="ref3">(Basile et al., 2019)</xref>
        .
      </p>
      <p>
        The DIACR-Ita task at the EVALITA 2020
campaign
        <xref ref-type="bibr" rid="ref4 ref5">(Basile et al., 2020b)</xref>
        fosters the
implementation of new systems purposely designed
for the Italian language. To achieve this goal, a
new dataset for the evaluation of lexical semantic
change on Italian has been developed based on the
“L’Unita`” corpus
        <xref ref-type="bibr" rid="ref2 ref4 ref5">(Basile et al., 2020a)</xref>
        . This is
the first Italian dataset manually annotated with
semantic shifts between two different time periods.
2
      </p>
      <sec id="sec-3-1">
        <title>Task Description</title>
        <p>The goal of DIACR-Ita is to establish if a set of
target words change their meaning across two time
periods, T1 and T2, where T1 precedes T2.</p>
        <p>Following the SemEval 2020 Task 1 settings,
we focus on the comparison of two time periods.
In this way, we tackle two issues:
1. We reduce the number of time periods for
which data has to be annotated;
2. We reduce the task complexity, allowing for
the use of different models’ architectures, and
thus widening the range of potential
participants.</p>
        <p>During the test phase, participants have been
provided with two corpora C1 and C2 (for the time
periods T1 and T2, respectively), and a list of target
words. For each target word, systems have to
decide whether the word changed or not its meaning
between T1 and T2, according to its occurrences in
sentences in C1 and C2. For instance, the
meaning of the word “imbarcata” is known to have
expanded4, i.e, it has acquired a new sense, from T1
to T2. This will be reflected in different
occurrences of the word usage in sentences between C1
and C2.</p>
        <p>The task is formulated as a closed task, i.e.
participants must train their model only on the data
provided in the task. However, participants may
rely on pre-trained word embeddings, but they
cannot train embeddings on additional diachronic
Italian corpora, they can use only synchronic
corpora.
3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data</title>
        <p>This section provides an overview of the datasets
that were made available to the participants in the
two different stages of the evaluation challenge,
namely trial and test.
3.1</p>
        <sec id="sec-3-2-1">
          <title>Trial data</title>
          <p>The trial phase corresponds to the evaluation
window in which the participants have to build their
systems before the official test data are release.
The following data were provided:
• An example of 5 trial target words for which
predictions are needed;
• An example of gold standard for the trial
target words;
• A sample submission file for the trial target
words;
4The word originally referred to an acrobatic manoeuvre
of aeroplanes. Nowadays, it is also used to refer to the state
of being deeply in love with someone.
• Two trial corpora that participants could use
to develop their models and check the
compliance of the generated output to the
required format;
• An evaluation and some additional utility
scripts for managing corpora.</p>
          <p>Trial data do not reflect the actual data from C1
and C2. The sample training corpora and target
words were artificially built just to provide an
example of the data format for developing their
systems. Since the training corpus is publicly
available on the Internet, we decided not to release
these data during the trial phase to prevent
participants from identifying the source data and
consequently potential set of target words.
3.2</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Test data</title>
          <p>For the test phase, the following data were
provided:
• A diachronic split of the “L’Unita`” corpus
into the two sub-corpora, C1 and C2, each
belonging to a specific time period;
• 18 target words, among which 6 were
identified as target of semantic meaning change
between the two time periods.</p>
          <p>
            Corpus Creation The “L’Unita`” diachronic
corpus
            <xref ref-type="bibr" rid="ref2 ref4 ref5">(Basile et al., 2020a)</xref>
            is a collection of
documents extracted from the digital archive of the
newspaper “L’Unita`”.5
          </p>
          <p>For the task, the corpus has been initially split
into two sub-corpora, C1, corresponding to the
time period T1 = [1945 1970], and C2,
corresponding to the time period T2 = [1990 2014].</p>
          <p>To facilitate participants in the closed-task
formulation, the corpora were provided in a
preprocessed format. In particular, we adopted a tab
separated format, with one token per line. For
each token, we provided its corresponding
partof-speech and lemma. Sentences are separated by
empty lines. Data were pre-processed with
UDPipe6 using the ISDT-UD v2.5 model. An
example of the data format is illustrated below.
Questa PRON questo
`e AUX essere
una DET uno</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5https://archivio.unita.news/</title>
    </sec>
    <sec id="sec-5">
      <title>6http://lindat.mff.cuni.cz/services/</title>
      <p>udpipe/run.php
frase NOUN frase
. PUNCT .</p>
    </sec>
    <sec id="sec-6">
      <title>Questa PRON questo</title>
      <p>`e AUX essere
un’ DET uno
altra ADJ altro
frase NOUN frase
. PUNCT .</p>
      <p>Participants are free to combine the available
information as they want. Furthermore, to
facilitate the generation of word embeddings, we made
available a script for generating a format
containing one sentence per line.</p>
      <p>The whole “L’Unita`” diachronic corpus has
been built, cleaned and annotated automatically.
This process consisted of several steps, namely:
Step 1: Downloading All PDF files are
downloaded from the source site and stored into a folder
structure that mimics the publication year of each
article.</p>
      <p>Step 2: Text extraction The text is extracted
from the PDF files by using the Apache Tika
library.7 First, the library tries to extract the
embedded text if present in the PDF. If this process fails,
the internal OCR system is used. It is important
to notice that during this step several OCR errors
may occur due to different reasons. The
processing of the early years of publications, i.e., between
1945–1948, represented a non trivial challenge for
the extraction of the textual data. In particular, we
noticed that the page format had a major impact on
the quality of the OCR. In these period, the
newspaper has quite an unconventional format where
a few large pages contain many articles scattered
into several columns. This affected the
performance of the OCR due to its failure in properly
identifying the column boundaries.</p>
      <p>Step 3: Cleaning In this step, we try to fix some
text extraction issues. We identified two lines of
actions, the first dealing with paragraph splits and
the second with noisy text. In the text extraction
process, paragraphs are separated by means of an
empty line. However, word hyphenation can
trigger errors in the paragraph segmentation phase by
wrongly adding empty lines. We addressed this
issue by reconstructing the paragraph on a
single text line, thus ensuring that empty lines are</p>
    </sec>
    <sec id="sec-7">
      <title>7https://tika.apache.org/</title>
      <p>
        only used to delimit the actual paragraphs. In our
case, noisy text corresponds to tokens whose
composing characters are wrongly interpreted by the
OCR mixing together alphabetical characters with
numbers or symbols. Two heuristics were
implemented to limit the amount of noisy text. The first
heuristic requires that paragraphs must contain at
least five tokens composed by only alphabetical
characters. The second heuristic requires that at
least 60% of each paragraph must contain words
that are attested in a dictionary. For this, we did
not use a reference dictionary, but we
automatically created it by extracting tokens from the Paisa`
corpus
        <xref ref-type="bibr" rid="ref20">(Lyding et al., 2014)</xref>
        . Numbers were
excluded and only alphabetical strings were retained.
The output of the cleaning process is a plain text
file for each year where each paragraph is
separated by an empty line.
      </p>
      <p>Step 4: Processing All plain text files produced
by the cleaning step are processed by a Python
script that splits each paragraph into sentences and
analyses each sentence with UDPipe 8 ISDT-UD
v2.5 model. In this way, we obtain tokens,
partof-speech tags, and lemmas. The processed data
are then stored in a vertical format as illustrated is
Section 3.</p>
      <p>After these preparation steps, the valid and
retained data for the task span over a temporal
period between 1948 and 2014. We revised the
initial split of the two sub-corpora as follows: C1
ranges between T1 = [1948 1970], and C2
between T2 = [1990 2014]. Table 1 illustrates the
distributions of the tokens across the two time
periods for the sub-corpora. The difference in the
number of tokens between C1 and C2 reflects
differences in the trends in the number of daily
published articles, due to cheaper printing costs and
the availability of new technologies such as the
World Wide Web.</p>
      <sec id="sec-7-1">
        <title>Corpus Period #Tokens</title>
        <p>L’Unita` 1948-1970 52,287,734</p>
        <p>L’Unita` 1990-2014 196,539,403
Table 1: Official Training Corpora: Occurrence of Tokens.
Creation of the Gold Standard The selection
of the target words that compose the Gold
Standard data required a manual annotation.
Identifying words that have undergone a semantic change</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8http://lindat.mff.cuni.cz/services/</title>
      <p>udpipe/run.php
is not an easy task. To boost the identification
of candidate target words, we adopted a
semiautomatic method. In the following paragraphs we
illustrate in detail our approach.</p>
      <sec id="sec-8-1">
        <title>Step 1: Selection of candidate words. The ini</title>
        <p>
          tial selection of potential candidate words
was based on Kronos-IT
          <xref ref-type="bibr" rid="ref3">(Basile et al., 2019)</xref>
          .
Kronos-IT is a dataset for the evaluation of
semantic change point detection algorithms
for the Italian language automatically built
by using a web scraping strategy. In
particular, it exploits the information presents on
the online dictionary “Sabatini Colletti”9 to
create a pool of words that have undergone
a semantic change. In the dictionary, some
lemmas are tagged with the year of the first
attestation of its sense. In some cases,
associated with the lemma there are multiple years
attesting the introduction of new senses for
that word. Kronos-IT uses this information to
identify the set of semantic changing words.
We retained those words that were predicted
to have changed their meaning after 1970, so
as to match the temporal periods of the
subcorpora. In this way, we obtained 106
candidate lemmas.
        </p>
        <p>Step 2: Filtering candidate targets. A
challenging issue is the attestation of the potential
candidate words in both sub-corpora with a
relatively high number of occurrences to
account for different contexts of use.
Frequency, indeed, plays a quite relevant role for
the task: infrequent tokens must be discarded
because they affect the quality of word
representations. The initial list of candidate
targets has been further cleaned by removing all
tokens that occur less than 20 times in each
corpora. Moreover, we conducted a further
analysis by manually inspecting some
randomly sampled lemma contexts. The aim of
this analysis was to remove targets for which
the lemmas occurrences are affected by OCR
errors. This analysis was performed by the
means of the Sketch Engine10, in particular
we analyze concordances of the target word
in order to discover OCR errors. One of such
words was “toro” derived from the mistaken</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9https://dizionari.corriere.it/</title>
      <p>dizionario_italiano/
10https://www.sketchengine.eu/
OCR of “loro”. At the end of this process, we
obtained a list of 27 candidate targets for the
annotation.</p>
      <p>Step 3: Manual Annotation. For each target, we
randomly extracted up to 100 sentences from
each of the sub-corpus11. Each sentence was
then annotated by two annotators: they were
asked to assign each occurrence to one of the
meaning of the lemma according to those
reported in the Sabatini-Coletti dictionary. In
case the meaning of the word in a sentence
was not present in the list of senses reported
in the reference dictionary, the annotators
were allowed to add the sense to the word.
In total, we annotated 2,336 occurrences of
the candidate target words.</p>
      <p>Step 4: Annotation check. All cases of
disagreement were collectively discussed among all
of the annotators to reach a final decision. We
observed that some disagreements were also
due to a biased interpretation of the context
of occurrence by one of the annotators. These
cases mainly concerned short ambiguous
sentences that prevented a clear identification of
the word meaning. As a result of this step, a
few candidates were removed from the pool
of candidates because occurring in too
ambiguous context.</p>
      <sec id="sec-9-1">
        <title>Step 5: Creation of the gold standard. We re</title>
        <p>tained as valid instances of lexical semantic
change all those targets that had occurrences
of one specific sense only in T2, and never in
T1. In other words, in the context of this task,
a valid lexical semantic change corresponds
to the acquisition of a new meaning by a
target word. Out of the 23 candidate target
words, only 6 of them show a semantic
change in T2. All the other targets did not
show a diachronic meaning change. In the
final Gold Standard, we kept 12 candidate
target words that did not change meaning
obtaining a final set of 18 target words.</p>
        <p>The Gold Standard contains 18 targets listed as
lemmas, one lemma per line, with an
accompanying label to mark whether the lemmas has
undergone semantic change (label 1) or not (label 0).</p>
        <p>11This means that in case a target words occurs less than
100 times, all occurrences were annotated.</p>
        <p>Participants were given a file containing the 18
target lemmas, one per each line, without annotation.
The expected system output is a modification of
this file where the participant had to annotate each
target lemma with the system prediction (0 or 1).
4</p>
        <sec id="sec-9-1-1">
          <title>Evaluation</title>
          <p>The task is formulated as a binary
classification problem. Systems predictions are evaluated
against the change labels annotated in the Gold
Standard by using accuracy.</p>
          <p>The test set (G) contains both positive (P ) and
negative (N ) examples, i.e. G = P [ N . For
example:</p>
          <p>P = fpilotato; lucciola; ape; rampanteg</p>
          <p>N = fbrama; processareg</p>
          <p>Negative words are those that did not undergo
a change in their meaning. Systems’ predictions
involve both positive and negative classified
targets P r = P rpos [ P rneg. Then, true positives
(positive targets classified as positive) are T P =
P \ P rpos, true negatives (negative targets
classified as negative) are T N = N \ P rneg, false
negatives (positive targets classified as negative) are
F N = P \P rneg and false positives (negative
targets classified as positive) are F P = N \ P rpos.
We can then compute the accuracy as:</p>
          <p>T P + T N
Accuracy =</p>
          <p>T P + T N + F P + F N
4.1</p>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>Baselines</title>
        <p>We provided two baseline models:
• Frequencies: The absolute value of the
difference between the word frequencies in the
two sub-corpora;
• Collocations: For each word, we build two
vector representations consisting of the
Bagof-Collocations related to the two different
time periods (T0 and T1). Then, we compute
the cosine similarity between the two BoCs.</p>
        <p>
          It is the same approach evaluated in
          <xref ref-type="bibr" rid="ref3">(Basile
et al., 2019)</xref>
          .
        </p>
        <p>In both baselines, we use a threshold to predict if
the word has changed its meaning. While for the
frequencies, a change is detected when the
difference is higher than the average. For the
collocations a semantic change occurs when the similarity
between the two time periods drops under the
average plus the variance. Both the average and the
variance are computed on the set of target words.</p>
        <p>False positives</p>
        <p>False negatives
-IM
OP</p>
        <p>S -Team</p>
        <p>B
UW</p>
        <p>CIC-NLPUNIMQIMBUL-SDS VI-IMS CL-IMS unipbdaSsBeMlin-eI-McoSllocations
Figure 1: Number of false positives and false negatives for each system.</p>
      </sec>
      <sec id="sec-9-3">
        <title>System</title>
        <p>OP-IMS
UWB Team
CIC-NLP
UNIMIB
QMUL-SDS
VI-IMS
CL-IMS
unipd
SBM-IMS</p>
      </sec>
      <sec id="sec-9-4">
        <title>Type</title>
        <p>Post-alignement
Post-alignement
PoS tag features
Jointly alignment
Jointly alignment
Jointly alignment
Contextual Embeddings
Contextual Embeddings</p>
        <p>Graph</p>
        <p>Table 2: Systems types.
5</p>
        <p>
          Systems
21 teams registered to the DIACR-Ita task.
However, 9 teams participated in the final task for a
total of 36 submitted runs. Based on the algorithms
employed, we can group systems into four
categories: Post-alignment, Joint Alignment,
Contextual Embeddings, Graph-based and PoS tag
features (see Table 2). The first two classes are
characterised by the type of alignment used.
Postalignment systems first train static word
embeddings for each time periods, and then align them.
Joint Alignment systems train word embeddings
and jointly align vectors across all time slices.
Contextual Embeddings systems use
contextualized embeddings, such as BERT
          <xref ref-type="bibr" rid="ref9">(Devlin et al.,
2019)</xref>
          ; while Graph-based systems rely on graph
algorithms. PoS tag features system rely on the
distribution of targets PoS tags across the two time
periods. The majority of participating systems use
cosine distance as a measure of semantic change,
i.e. compute the cosine distance between the
vectors of the target lemmas among time periods.
Other systems use the Average Pairwise Cosine
Distance or the Average Canberra Distance, since
the cosine distance does not fit contextual
embeddings representations. The last group of systems
uses graph-based measures.
        </p>
        <p>
          We report a short description of each team (best
submission) as follows:
OP-IMS
          <xref ref-type="bibr" rid="ref16">(Kaiser et al., 2020)</xref>
          This team uses
Skipgram model with Negative sampling
(SGNS) to compute word embeddings, the
resulting matrices are mean-centred. Word
embeddings are aligned using Orthogonal
Procrustes. They choose cosine similarity to
compare vectors of different word spaces and
a threshold based on mean and standard
deviation to classify target words.
        </p>
        <p>
          UWB Team
          <xref ref-type="bibr" rid="ref2 ref21">(Prazˇ a´k et al., 2020)</xref>
          The team maps
semantic spaces using linear transformations,
such as Canonical Correlation Analysis and
Orthogonal Transformation and cosine
similarity as a measure to decide if a target word
is stable or not. They use a threshold based
on mean.
        </p>
        <p>
          CIC-NLP
          <xref ref-type="bibr" rid="ref2">(Angel et al., 2020)</xref>
          This team
analyses the Part-Of-Speech distribution over the
two corpora and create vectors with
information about the most common word
POStags. Then, they obtain a score using pairs of
vectors of the two time periods and the sum
of Euclidean, Manhattan and cosine distance.
They rank targets in discerning order. Finally,
they label first upper-third targets as changed
words.
        </p>
        <p>
          UNIMIB
          <xref ref-type="bibr" rid="ref6">(Belotti et al., 2020)</xref>
          The team creates
temporal word embeddings using Temporal
Word Embeddings with a Compass (TWEC)
          <xref ref-type="bibr" rid="ref10">(Di Carlo et al., 2019)</xref>
          . They use the move
measure, i.e. a weighted linear
combination of the cosine and Local Neighbors,
introduced by
          <xref ref-type="bibr" rid="ref13">(Hamilton et al., 2016)</xref>
          . They
label targets as stable if the move measure is
greater than 0.7.
        </p>
        <p>
          QMUL-SDS
          <xref ref-type="bibr" rid="ref1">(Alkhalifa et al., 2020)</xref>
          The team
uses TWEC
          <xref ref-type="bibr" rid="ref10">(Di Carlo et al., 2019)</xref>
          to
compute temporal word embeddings with TWEC
C-BoW model (Continuous Bag of Words)
default settings. They use a cosine similarity
as measure of change and a threshold based
on mean.
        </p>
        <p>
          VI-IMS The team uses SGNS to create word
embeddings exploiting Vector Initialization
          <xref ref-type="bibr" rid="ref17">(Kim et al., 2014)</xref>
          . They use cosine
distance as a measure of semantic change and a
threshold based on the mean and the standard
deviation to classify targets words.
        </p>
        <p>
          CL-IMS
          <xref ref-type="bibr" rid="ref19">(Laicher et al., 2020)</xref>
          The team creates
word vectors using different combinations of
the first and last four layers of BERT. They
rank targets according to Average Pairwise
Cosine Distance, and label the first 7 targets
as changed words.
unipd
          <xref ref-type="bibr" rid="ref7">(Benyou et al., 2020)</xref>
          This team uses
contextualised word embeddings and an linear
combination of distances metrics to
measure semantic change, namely Euclidean
Distance, Average Canberra distance, Hausdorff
distance, as well as Jensen–Shannon
divergence between cluster distributions. They
rank targets according to the score obtained,
and label the first half as changed words.
SBM-IMS The team compute token vectors using
BERT. They create a graph where the vertices
are the vectors extracted from BERT, while
the edges are the cosine distance between
word vectors. They cluster the graph with
Weighted Stochastic Block Model. Then,
they consider the number of incoming edges
from the first and second period as a measure
of semantic change.
Table 3 reports the final results. The best result
has been achieved by two systems: OP-IMS and
UWB-Team. Both systems exploit post-alignment
strategy. The second system CIC-NLP uses an
approach based on PoS tag features. QMUL-SDS
and VI-IMS are based on joint alignment, while
unipd and SBM-IMS use contextual embeddings.
The last system SBM-IMS is the only graph-based
approach. Moreover, we report both false
negative and false positives in Figure 1. Both
postalignment systems share the same unique false
negative: the target “tac”, while CIC-NLP detects
two false positives. Joint-alignment systems have
a number of false positives higher or at least equal
to the number of false negatives. CL-IMS and
unipd produce respectively 2 and 3 false
negatives and both misclassify three stable words. The
only graph-based approach, SBM-IMS, reports the
highest number of false positives. In conclusion,
the results show that systems based on post/joint
alignment and PoS tag features achieve the best
performance, while contextual embeddings do not
perform as good in this type of task. However all
the systems outperform both the baselines.
7
        </p>
        <sec id="sec-9-4-1">
          <title>Conclusions</title>
          <p>We proposed for the first time the “Diachronic
Lexical Semantics” (DIACR-Ita) task. The goal
of the task is to develop systems able to
automatically detect if a given word has changed its
meaning over time, given contextual information from
corpora. We created two corpora for two
different time periods T1 and T2, and we manually
annotated a set of target words that change/do not
change meaning across these two periods. This
is the first Italian dataset of this type. 9 teams
participated in the task for a total of 36
submitted runs. All the systems are able to outperform
the two baselines. The results suggests that
methods based on post-alignment are the most suitable
for this type of task, resulting in better
performance even when compared to contextual
embedding methods, such as BERT.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Rabab</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          , Adam Tsakalidis, Arkaitz Zubiaga, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Liakata</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>QMUL-SDS @ DIACRIta: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Jason</given-names>
            <surname>Angel</surname>
          </string-name>
          , Carlos A.
          <string-name>
            <surname>Rodriguez-Diaz</surname>
            ,
            <given-names>Alexander</given-names>
          </string-name>
          <string-name>
            <surname>Gelbukh</surname>
            , and
            <given-names>Sergio</given-names>
          </string-name>
          <string-name>
            <surname>Jimenez</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>CIC-NLP @ DIACR-Ita: POS and Neighbor Based Models for Lexical Semantic Change in Diachronic Italian Corpora</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Giovanni Semeraro, and
          <string-name>
            <given-names>Annalina</given-names>
            <surname>Caputo</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Kronos-it: A dataset for the Italian semantic change detection task</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          , volume
          <volume>2481</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Tommaso Caselli, Pierluigi Casotti, and
          <string-name>
            <given-names>Rossella</given-names>
            <surname>Varvara</surname>
          </string-name>
          .
          <year>2020a</year>
          .
          <article-title>A Diachronic Italian Corpus based on “L'Unita`”</article-title>
          .
          <source>In CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          . 2020b.
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Federico</given-names>
            <surname>Belotti</surname>
          </string-name>
          , Federico Bianchi, and
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Palmonari</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>UNIMIB @ DIACR-Ita: Aligning Distributional Embeddings with a Compass for Semantic Change Detection in the Italian Language</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Wang</given-names>
            <surname>Benyou</surname>
          </string-name>
          , Emanuele Di Buccio, and
          <string-name>
            <given-names>Massimo</given-names>
            <surname>Melucci</surname>
          </string-name>
          .
          <year>2020</year>
          . University of Padova at DIACRIta. In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Marco Del Tredici</surname>
            ,
            <given-names>Malvina</given-names>
          </string-name>
          <string-name>
            <surname>Nissim</surname>
            , and
            <given-names>Andrea</given-names>
          </string-name>
          <string-name>
            <surname>Zaninello</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Tracing metaphors in time through self-distance in vector spaces</article-title>
          .
          <source>In CEUR Workshop Proceedings. 3rd Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2016</year>
          and
          <article-title>5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          , EVALITA 2016 ; Conference date:
          <fpage>05</fpage>
          -
          <lpage>12</lpage>
          -2016 Through 07-
          <fpage>12</fpage>
          -
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Carlo</surname>
          </string-name>
          , Federico Bianchi, and
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Palmonari</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Training temporal word embeddings with a compass</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>33</volume>
          , pages
          <fpage>6326</fpage>
          -
          <lpage>6334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Haim</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          , Simon Hengchen, Nina Tahmasebi, and
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>TimeOut: Temporal Referencing for Robust Modeling of Lexical Semantic Change</article-title>
          .
          <source>In 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>457</fpage>
          -
          <lpage>470</lpage>
          .
          <article-title>Association for Computational Linguistics (ACL), sep</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Hila</given-names>
            <surname>Gonen</surname>
          </string-name>
          , Ganesh Jawahar, Djame´ Seddah,
          <string-name>
            <given-names>and Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Simple, interpretable and stable method for detecting words with usage change across corpora</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>538</fpage>
          -
          <lpage>555</lpage>
          , Online, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>William L. Hamilton</surname>
            , Jure Leskovec, and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change. In 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <article-title>ACL 2016 - Long Papers</article-title>
          , volume
          <volume>3</volume>
          , pages
          <fpage>1489</fpage>
          -
          <lpage>1501</lpage>
          , may.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Willem</given-names>
            <surname>Hollmann</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Semantic change</article-title>
          .
          <source>In English Language: Description, Variation and Context</source>
          , pages
          <fpage>301</fpage>
          -
          <lpage>313</lpage>
          . Basingstoke: Palgrave.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Renfen</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Shichen</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View</article-title>
          .
          <source>In 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>3899</fpage>
          -
          <lpage>3908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Jens</given-names>
            <surname>Kaiser</surname>
          </string-name>
          , Dominik Schlechtweg, and Sabine Schulte Im Walde.
          <year>2020</year>
          .
          <article-title>OP-IMS @ DIACR-Ita: Back to the Roots: SGNS+OP+CD still rocks Semantic Change Detection</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yi-I Chiu</surname>
            , Kentaro Hanaki, Darshan Hegde, and
            <given-names>Slav</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Temporal analysis of language through neural language models</article-title>
          .
          <source>In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          , Lilja Øvrelid, Terrence Szymanski, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Velldal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Diachronic word embeddings and semantic shifts: a survey</article-title>
          .
          <source>27th International Conference on Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Severin</given-names>
            <surname>Laicher</surname>
          </string-name>
          , Dominik Schlechtweg, Gioia Baldissin, Enrique Castaneda, and Sabine Schulte Im Walde.
          <year>2020</year>
          .
          <article-title>CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Verena</given-names>
            <surname>Lyding</surname>
          </string-name>
          , Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell'Orletta, Henrik Dittmann, Alessandro Lenci, and
          <string-name>
            <given-names>Vito</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The paisa' corpus of italian web texts</article-title>
          .
          <source>In 9th Web as Corpus Workshop (WaC-9)@ EACL</source>
          <year>2014</year>
          , pages
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . EACL (
          <article-title>European chapter of the Association for Computational Linguistics)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Ondrˇej Prazˇa´k</surname>
            , Pavel Prˇiba´nˇ, , and
            <given-names>Stephen</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and Orthogonal Transformation</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Maja</given-names>
            <surname>Rudolph</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Blei</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Dynamic Embeddings for Language Evolution</article-title>
          .
          <source>In WWW '18: Proceedings of the 2018 World Wide Web Conference</source>
          , pages
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          .
          <article-title>Association for Computing Machinery (ACM).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barbara</surname>
            <given-names>McGillivray</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Simon</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , Haim Dubossarsky, and
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Semeval-2020 task 1: Unsupervised lexical semantic change detection</article-title>
          . arXiv preprint arXiv:
          <year>2007</year>
          .11464.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Risse</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Finding IndividualWord Sense Changes and their Delay in Appearance</article-title>
          .
          <source>In International Conference Recent Advances in Natural Language Processing</source>
          , pages
          <fpage>741</fpage>
          -
          <lpage>749</lpage>
          . Assoc. for Computational Linguistics Bulgaria, nov.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Nina</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , Lars Borin, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Survey of Computational Approaches to Lexical Semantic Change</article-title>
          . 1st International Workshop on Computational Approaches to Historical Language Change
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Xuri</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A state-of-the-art of semantic change computation</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <fpage>649</fpage>
          -
          <lpage>676</lpage>
          , sep.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Closs Traugott</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Semantic change: Bleaching, strengthening, narrowing, extension</article-title>
          .
          <source>In Encyclopedia of Language and Linguistics</source>
          . Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Zijun</given-names>
            <surname>Yao</surname>
          </string-name>
          , Yifan Sun, Weicong Ding,
          <string-name>
            <given-names>Nikhil</given-names>
            <surname>Rao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hui</given-names>
            <surname>Xiong</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Dynamic word embeddings for evolving semantic discovery</article-title>
          .
          <source>In WSDM 2018 - Proceedings of the 11th ACM International Conference on Web Search and Data Mining</source>
          , volume
          <volume>2018</volume>
          <source>- Febua</source>
          , pages
          <fpage>673</fpage>
          -
          <lpage>681</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>