<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Shift Detection in Vatican Publications: a Case Study from Leo XIII to Francis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio Ferrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Periti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Milano Department of Computer Science Via Celoria</institution>
          ,
          <addr-line>18 - 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the recent years, word embedding models are being proposed to efectively detect language change and semantic shift in diachronic corpora. In this paper, we present a comparative analysis of diferent word embedding approaches by considering a case-study based on an Italian diachronic corpus of Vatican publications of Popes from Leo XIII to Francis (1898-2020). Four diferent approaches are considered, characterized by the adoption of diferent embedding models each one trained over the publications of a specific pope. The paper aims to explore whether and how word embedding techniques are successful in detecting semantic shifts over the language used by popes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computational Humanities</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>Semantic Shift Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the recent years, the use of machine learning models in the field of Computational History is
gaining more and more attention [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In particular, the application of word embedding techniques
to the analysis of historical corpora is providing interesting and promising research results [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
However, when historical corpora span diferent time periods, a number of linguistic issues can
emerge. A word can evolve across the years by acquiring/losing meanings or by changing the
context in which it is employed. For examples, the word gay shifted from meaning ‘cheerful’ to
‘homosexual’ during the 20th century, or the word girl having meant ‘young person of either
gender’ in the past [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We refer to this process as semantic shift. Although in the past decades
the automatic detection of semantic shift had been already investigating through data-driven
approaches [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], solutions based on word embedding models are currently being proposed
and they are characterized by i) time-oriented splitting of a considered diachronic corpus into
sub-corpora in which a coherent language without semantic shifts can be assumed, and ii)
comparison of word embeddings derived from the sub-corpora to capture the semantic shift of
words across diferent time periods. These approaches leverage the idea that semantically-related
words are close the one to the others in the embedding space [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, word embeddings
from diferent temporal vector spaces cannot be naturally compared due to their stochastic
nature. Consequently, diferent approaches have been proposed to enable the embedding
comparison across diferent models.
      </p>
      <p>Motivations. In this paper, we present a comparative analysis of diferent word embedding
approaches by exploiting a diachronic corpus of Vatican publications from Leo XIII to
Francis (1898-2020). The goal of the work is twofold. On one side, we aim at exploring whether
and how word embedding techniques are successful in detecting semantic shifts over oficial
documents and real documents that address a large audience over a long time period. Moreover,
the paper aims at comparing and discussing the efectiveness of diferent literature approaches
to capture the semantic shifts on a corpus of limited size and highly unbalanced nature like the
Vatican publications corpus. On the other side, the corpus of Vatican publications represents
a textual dataset of great interest, motivated not only by the exceptional historical depth of
the corpus, but also by two reasons concerned with the nature of Vatican documents. The
ifrst reason is that the Catholic Church, through the writings of its popes, has always dealt
with the most relevant issues in the public debate of its time, alongside the themes of faith and
worship. Therefore, these writings constitute a historical source of primary importance for
reconstructing an important part of the human cultural history. The second reason is related to
the presence in the writings of the Holy See of terms and concepts that are characterized by
a poor semantic shift over time alongside others that have instead remarkably changed both
in terms of relevance and context. The former are mainly terms referring to the dogmas of
faith which, albeit with some variations, essentially remained stable in the discourse of the
popes. On the opposite, the latter are terms that describe well the way in which the attention
of the public discourse shifted over time to diferent topics, such as the environment, the role
of science, and many historical events of the human history. For these reasons, the corpus of
Vatican documents is a perfect laboratory for experimenting with the techniques of semantic
shift detection and this work constitutes a first step in the investigation of this very rich heritage
of human culture.</p>
      <p>The paper is organized as follows. In Section 2, we discuss the related work. In Section 3, we
present our case-study on Vatican publications. The methods used for the case study analysis is
described in Section 4. The results of the case study are presented in Section 5. In Section 6, we
ifnally provide our concluding remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        As a general remark, word embeddings approaches to semantic shift detection are based on
time-sliced corpora and separate embedding models. The comparison of diferent word
representations over time (one per model) is enforced through a distance measure such as for
example the cosine or jaccard similarity. A simple Non-Aligned (NA) method for semantic
shift detection is proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where the use of a word over the time is detected through
the analysis of the word context in diferent time periods. In particular, the idea is to consider
the top- neighbors of a word in each temporal embedding model and to measure the overlap
of these lists suggesting that smaller overlaps means drastic changes. However, an alternative
and more typical solution is based on the idea to align word representations (i.e., embeddings)
which live in diferent temporal spaces before compared them. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], an Incremental Update
(IU) mechanism of the embedding models is proposed. After a model is trained on a first period,
it is then updated with data from the following time periods by saving its state as a new period
model each time. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the idea is to align embedding models to a unique vector space using
heuristic local alignments per word based on the assumption that the set of nearest words in
the embedding space change for words that have a shift. Then, changes between periods are
detected by a distance-based distributional time series for each word in the corpus. The idea
of using a similar transformation in the temporal correspondence problem is proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
where, given an input term (e.g., iPod) and a target time (e.g., 1980s), the task is to predict
the counterpart of the query that existed in the target time (e.g., walkman). The approach
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] relies on the orthogonal Procrustes (PR) as a global alignment mechanism for temporal
embedding spaces in the evaluation framework of diferent embedding techniques for detecting
semantic shifts. Further studies attempted to combine information captured by the embedding
models and the frequency of changes for capturing word shifts (e.g., [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]).
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the idea of creating dynamic embedding models is proposed where data across all the
time periods are shared so that there is no need to align embedding spaces trained on separate
sub-corpora. A Bayesian version of the skip-gram model with a latent time series as prior is
proposed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Similarly, in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the authors propose to extend the skip-gram model by
modeling time as a continuous variable. In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], a diferent approach is presented in which
word embeddings for each time period were not first learned, then aligned, but rather learned
and aligned at the same time. As a further approach, the idea of [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is to train embeddings on
a corpus as a whole while tagging some word of interest with a special tag that indicate which
corpus it comes from. As a result, an individual time-dependent embedding is created for each
target word. To avoid the embedding alignment through orthogonal transformations, in [18],
the authors propose to compute Second-Order embeddings (SO), namely embeddings that
share the same temporal space since obtained by modeling the meaning of words by means of
their semantic similarity relations with all the other words in the vocabulary.
      </p>
      <p>As a final remark, we note that an increasing interest is emerging about the use of
contextualised pre-trained models for semantic shift detection [19, 20]. However, in this paper, such
approaches are not considered since recent comparisons show that static embedding models,
like Word2Vec, outperfomed the contextualised ones for semantic shift detection [21].</p>
    </sec>
    <sec id="sec-3">
      <title>3. The Vatican corpus</title>
      <p>The considered corpus of Vatican publications contains 27,831 documents extracted from the
digital archive of the Vatican website1. The corpus consists of all the web-available documents
at downloading time from Leo XIII to Francis (1878-2020) and the popes represent a natural
criterion for splitting the corpus along the time, meaning that a separate sub-corpus is defined
for each pope with associated publications. Furthermore, we stress that the documents have
been downloaded in Italian. This choice is motivated as follows:
• The documents on the Vatican website are available in various languages, including Italian,</p>
      <p>Latin, English, Spanish, and German. We decided to work with the Italian language since
a largest number of documents can be obtained in this language (consider that only 14,384
documents are available in English).
• In addition, although the oficial language of the Holy See is Latin, some of the available
texts are not real oficial documents of the Catholic Church (e.g., encyclicals, apostolic
constitutions, letters or exhortations), but they are about oficial documents of minor
dogmatic importance (e.g., homilies, audiences, messages, biographies). Again, the number
of available Latin documents about the publications from popes (i.e., 5,027 texts) is strongly
less than the number of Italian documents.</p>
      <p>A summary description of the considered Vatican corpus is provided in Table 1. Tokens
represent the text units (i.e., words, terms) extracted from the Vatican documents through a text
lowercasing step. As a further feature of the considered Vatican corpus, we note that the size of
the sub-corpora from the popes varies from few documents (e.g., 19 documents from John Paul
I) up to some thousands (e.g., 15,307 documents from John Paul II), meaning that the overall
dataset is an example of unbalanced corpus.</p>
      <p>Pope
Leo XIII
Pius X
Benedict XV
Pius XI
Pius XII
John XXIII
Paul VI
John Paul I
John Paul II
Benedict XVI
Francis</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methods for the case study analysis</title>
      <p>
        In this paper, we consider four diferent literature approaches to semantic shift detection for
application to the Vatican corpus. In particular, we selected a non-aligned (NA) approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
three diferent aligned solutions to make comparable the temporal vector spaces of diferent time
periods, namely Procrustes (PR) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Incremental Updates (IU) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and Second Order Embeddings
(SO) [18]. In our comparative analysis, the following data processing steps are executed, namely
time-oriented splitting, word embeddings construction and alignment, and semantic shifts detection.
Time-oriented splitting. The Vatican corpus is split by creating a separate sub-corpus for
each pontificate. Due to the short pontificate of John Paul I and the lack of documents from Pius
X and Pius XI, we decide to group their documents with those of the immediately preceding
popes. As a result, we merge the documents of John Paul I with those of Paul VI, the documents
of Pius XI with those of Benedict XV, and the documents of Pius X with those of Leo XIII,
respectively.
      </p>
      <p>Word embeddings construction. For each one of the considered approaches (i.e., NA, PR,
IU, SO), we train 100-dimensional word embeddings over each sub-corpus by exploiting the
Gensim’s implementation of Word2Vec.2
Word embeddings alignment. For the three aligned solutions, the alignment of embeddings
belonging to separate vector spaces is executed as follows.</p>
      <p>Procrustes (PR). We perform a cross-time alignment through the Procrustes implementation
available at www.github.com/williamleif/histwords. The Procrustes assumption is that each
word space has axes similar to the axes of the other word spaces, and two word spaces are
diferent due to a rotation of the axes:</p>
      <p>
        () =   = ||  −  +1||
where   and  +1 are matrices of word embeddings learn at year  and  + 1 respectively,
and Q is an orthogonal matrix that minimizes the Frobenius norm of the diference between
  and  +1 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Incremental Updates (IU). We consider the model on the sub-corpus related to Leo XIII (the
ifrst pope in the dataset by time), and then we update the model with data of subsequent popes
saving its state each time as a new pope model. Each model +1 is initialized with the word
vectors from the previous model  [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Second Order Embeddings (SO). As proposed in [18], we build second order embeddings by
modeling the words by means of their semantic similarity relations with all the other words
in the vocabulary. Denoting an embedding of a word  at time period  as () ∈ R100 we
consider the vectors:</p>
      <p>˜() = (︀ sim((), 1()), ..., sim((), | |())︀
where  is a common vocabulary of all the words in all the time periods and sim is a similarity
function such as the cosine similarity. For computational purposes, we define the common
vocabulary  by relying on mutual information values computed between words and classes
of text (e.g., encyclicals, apostolic exhortations, homelies) associated with each pope. For
each class of text, we select the top-500 words by mutual information score. Similarly to the
experiment performed in [18], we only keep words associated with nouns, adjectives, and verbs.
Furthermore, we exclude stopwords and words shorter than 4 characters.</p>
      <p>Semantic shifts detection. Word vectors from distinct time-sliced models cannot be directly
compared due to the stochastic nature of Word2Vec. This issue does not preclude the comparison
of distances between pair of words over time, which means that it is possible to compare the
semantic similarities of a pair of words in distinct models. For the sake of clarity, as an example,
we consider the case of temperature. Temperatures from diferent scales, such as Celsius and
2https://radimrehurek.com/gensim/
Kelvin, cannot be directly compared. They need to be aligned, i.e., one has to be converted to
the scale of the other. However, since scales are related to an additive constant, we can directly
compare deltas of temperatures computed in diferent scales.</p>
      <p>Similarly, we decide to exploit:
1. non-aligned embeddings to analyze the relative position of word pairs (i.e., the distance
between their vectors) in diferent vector spaces. With respect to the above example, this
corresponds to compare temperature deltas in diferent scales;
2. aligned embeddings to analyze the positions of a word over time (i.e., the distance between
the vector of that word and itself in distinct aligned vector spaces). With respect to the
above example, this corresponds to convert a temperature from a scale 1 to another 2
before comparing it with another temperature in scale 2.</p>
      <p>Pairwise word similarity. We exploit non-aligned embeddings to compute the pairwise cosine
similarity between a pair of word vectors 1 and 2 across time in two diferent models 
and  . In particular, as we chronologically trained the models pope by pope (they follow each
other over time without overlapping) we show how cosine similarity between word vectors
could highlight the strength of the relationship in the perspective of diferent popes.
(1 , 2 ) =</p>
      <p>1 · 2
||1 || ||2 ||</p>
      <p>Word context comparison. We exploit non-aligned embeddings to explore the context of words.
Given a word , we investigate the top- words corresponding to the  closest vectors to the
vector of  (i.e. the  most similar words to ) in each embedding model. In other words, the 
closest vectors to the vector of  are the top- vectors with highest cosine similarity value from
that vector. Besides learning how neighbors change over time for diferent popes, we estimate
the context similarity of a given word  between each pair of popes by computing the jaccard
similarity score between the  most similar words to  in their respective models  and  .
(- , - ) = |- ∩ - |
|- ∪ - |</p>
      <p>Self word similarity. The need of aligned embeddings rises to mutually compare words
over time. By relying on the cosine similarity, we detect meaning change independent from
neighboring words by considering the self similarity of a word  throughout consecutive time
models , +1.</p>
      <p>( , +1 ) =</p>
      <p>· +1
|| || ||+1 ||</p>
    </sec>
    <sec id="sec-5">
      <title>5. Case study results</title>
      <p>In this section, we discuss the results of the approaches presented in Section 4 for semantic shift
detection applied to the Vatican corpus of Section 3 according to pairwise word similarity, word
context comparison, and self word similarity. similarity.</p>
      <p>One of the main problems in evaluating the results is that it is dificult to define a ground
truth that provides information about the expected shifts in the Vatican corpus. To address this
issue, we run our tests by exploiting three main categories of words.</p>
      <p>• Words representing long-term concepts in the Vatican publications (e.g., jesus,
eucharist, ...). These terms represent central concepts in the Church, usually related to
theological issues. For those terms, then, we expect to observe a limited shift of meaning
in the publications of the diferent popes.
• Words representing concepts from the past (e.g., heresy, perversion, ...). These
terms are related to topics that have been central in the Vatican publications in the past,
but that nowadays are less present in the popes publications in favour of new words that
are more strictly related to events and social phenomena that are perceived as important
at the present time. For those terms, we expect to observe a decreasing trend along the
temporal dimension.
• Words representing concepts from today (e.g., environment, science, ...), that are
the opposite of the concepts from the past, namely words representing concepts that are
important nowadays for which we expect to observe a growing trend along time.</p>
      <p>For the sake of readability, the considered words from the Vatican corpus are translated from
Italian to English.</p>
      <p>Pairwise word similarity. In Figure 1, we show examples of word pairs taken from long-term
concepts (first row), concepts from the past (second row), and concepts from today (third row),
respectively. For each pair of words, we compare their cosine similarity in the models trained
on the diferent popes, exploiting both aligned and non-aligned embeddings. The relevant issue
in this experiment is that the comparison between words is based exclusively on their relative
position in the vector space. As a consequence, we do not have any information about the
stability of the meaning of each single word per se. The only information available is about
the meaning of a word with respect to the other in the same pair. As typically occurs for word
embedding methods, the proximity assumption holds. Thus, if two words are similar (i.e., their
are close in the vector space) we can derive that their meaning is also similar since the two
words are used in a similar context.</p>
      <p>Concerning long-term concepts (first row of Figure 1), the cosine similarity values for each
word pair are stable in time. In particular, we note that the pairs are essentially composed
by a word and its consolidated epithet (i.e., Virgin Mary, Jesus Christ) or alias (i.e.,
Eucharist, also called Most Blessed Sacrament). Such similarity values suggest that
the meaning shift for these long-term concepts is limited as expected.</p>
      <p>For the concepts from the past, the trend of the pairwise similarity between the considered
words is decreasing. In particular, we note that pairs having a strong similarity in the past can
be characterized by a lower similarity in the publications of recent popes, like ‹perversion,
novelty›. This means that these words were originally used in the same context, but their
linguistic and thematic context is changed along time, either because they are no more used
together or because one of the two, or both, are now rarely used.</p>
      <p>For the concepts from today, we observe the opposite phenomenon. The similarity of word
pairs increases over diferent pontificates, suggesting that the cultural changes that characterize
the 21th century have induced popes to increasingly use the two terms in similar contexts. A
pretty clear example of this behavior is given by the pair ‹science, technique› where we note
that the trend of the word technique is to become almost a synonym of the word science,
but only after the 70s with John Paul II. In this respect, it is also interesting to consider the new
words closely related to a certain pair introduced by a pope in comparison with the dictionary
of the previous one. The new words of a pope are determined as the set diference between
the subset  of the vocabulary of a pope  and the entire dictionary of the previous pope
− 1, where  is the set of the 30 words closest to the mean vector of a certain word pair in the
embedding model related to . For example, with respect to the pair ‹environment, planet›,
Francis introduced the words amazonia, biodiversity , deforestation, ecosystem,
energetic, and oceans. With respect to the pair ‹sex, gender›, Francis introduced the words
mistreatment, and homosexuals; while for the pair ‹science, technique› John Paul II
introduced the words astronomy, biology, biomedical, branch, cosmology, computer
science, engineering, molecular, psychiatry, technological.</p>
      <p>
        As a further remark, we note that the same trend result in the relative position of the
considered word pairs can be detected either using aligned or non-aligned embeddings.3
3The breaks in the lines do not appear in IU models due to the main limitation of this approach: the recognition of
Word context comparison. In Figure 2, we consider the target words jesus, environment,
heresy and we explore their context composed of 1, 5, and 10 most similar words in the diferent
embedding models, namely the words corresponding to the 1, 5, 10 vectors that closest to the
vector of the target. The color gradation describes the intensity of the jaccard similarity value
between any pair of popes. Obviously, the diagonal always shows the darkest color since the
jaccard similarity value between a pope and himself is equal to 1. About the word jesus,
the top-1 plot of Figure 2 show that all the popes except Francis share the most similar word.
This result confirms the observations of pairwise word similarity reported in Figure 1, where
the pair ‹jesus, christ›is almost unchanged from Leo XIII to Benedict XVI. Also when the
contexts of top-5 and top-10 words are considered, the stable behavior of jesus can be observed
over diferent pontificates (i.e., many dark areas can be recognized on the first row). On the
a word change is possible only if the word has enough occurrences in the considered time period. If the occurrences
of a word dramatically decrease (or completely disappear), its word vector will remain the same and hence it is not
possible to observe any change [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
opposite, the words environment and heresy are afected by semantic shifts. About the
word environment a shift can be observed in both Paul VI and Pius XII. About the word
heresy, the shift is less pronounced. The context of the word heresy is more similar in the
popes of the past, rather than in those of the recent periods. As an exception, the similarity
values between John Paul II and Benedict XVI denote a semantic shift in the context of the
considered target words. Exploring the closest common words to heresy from Benedict XV
and Pius XII, we find nestorio (i.e., the name of an Archbishop of Constantinople from which
Nestorianism - a doctrine condemned as a heretic by the Council of Ephesus in 431 - takes its
name), condamnation, apostasy. When John Paul II and Benedict XVI are considered, the
closest common words to heresy are arian and arianism that are about the heresy of Ario,
condemned as a heretic by the first council of Nicaea in 325.
      </p>
      <p>Self word similarity. In this experiment, we consider the position of a word with respect
to itself, by measuring the similarity of a word vector at time  with respect to the vector of
the same word at time  − 1. In particular, in Figure 3, we observe the trend of the self cosine
similarity for the words environment, travel, and progress.The similarity measures are
computed by exploiting both the aligned methods (blue line) and the non-aligned one (green
line). Since Leo XIII is the first Pope of our corpus, it is not possible to calculate the self cosine
similarity of a word with respect to the model of the previous Pope. For this reason, the lines
reported in the figure start from Benedict XV instead of Leo XIII. Since in this experiment we
compare a word with itself in diferent models, we expect to observe high values of similarity
with a limited variation. However, this expectation is confirmed only for the aligned methods SO
and IU. This is due to the fact that independent models trained on diferent corpora of diferent
periods can be directly compared only when models are aligned as it occurs in SO and IU. In
the case of Procrustes PR, the low values of the self cosine similarity reveal that the alignment
mechanism adopted by this method is not suitable for small-sized, unbalanced datasets like
the considered Vatican corpus. According to the literature, low values of self similarity can be
associated with a semantic change of the considered word, while high values of self similarity
denote stable word meanings. As a result, we claim that successive increasing values of self
similarity suggest a strengthening of the word meaning, while successive decreasing values of
similarity suggest a weakening of the word meaning. About the considered target words, we
note that the trends of the self cosine similarity are diferent for IU and SO models, but they
share the increasing/decreasing direction of some shifts, such as for example between Paul VI
and John Paul II for the word environment. This can be interpreted as a consolidation of the
word meaning. Furthermore, both SO and IU models share shifts between John XXIII and Paul
VI for the word progress, but this behavior is less evident in the SO model. This can be due to
the dimensionality reduction applied when the second order embeddings are built.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Concluding remarks and future work</title>
      <p>In this paper, we considered diferent approaches to semantic shift detection and we discussed
the results obtained on a corpus of Vatican publications related to popes from Leo XIII to Francis
(1878-2020). The results show that word embedding can be successfully employed in semantic
shift detection, even when a small-sized, unbalanced dataset is considered like the Vatican
corpus. Both aligned and non-aligned approaches have been exploited in the proposed case
study. The results reveal that the alignment of embedding models over diferent vector spaces
is not required when we consider pairs of words belonging to diferent time periods. On the
opposite, to successfully detect the meaning shift of a word along time over diferent vector
spaces require the adoption of an alignment mechanism, so that the word vectors belonging
to diferent periods are comparable. However, when alignment approaches are adopted, our
results show that the change of a word over time can be noisy and the interpretation of the
word behavior can be dificult (e.g., see the case study results of the Procrustes method when
the self word similarity is considered).</p>
      <p>Ongoing and future work are focused on exploring semantic shift detection techniques by
relying on contextualized word embedding models like BERT. In this direction, BERT-like
models allow to capture the sense diferentiations of a target word, meaning that they can detect
the diferent meanings of the considered target according to the diferent contexts in which
the word is used throughout the whole corpus. Furthermore, contextualized embeddings can
leverage the benefits of existing pre-trained models, thus avoiding the execution of a (costly)
training phase over each time-sliced sub-corpus.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This paper is partially funded by the RECON project within the UNIMI-SEED research
programme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.-m. Au</given-names>
            <surname>Yeung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Studying how the Past is Remembered: Towards Computational History Through Large Scale Text Mining</article-title>
          ,
          <source>in: Proc. of the CIKM, ACM</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1231</fpage>
          -
          <lpage>1240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bjerva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Praet</surname>
          </string-name>
          ,
          <article-title>Word Embeddings Pointing the Way for Late Antiquity</article-title>
          ,
          <source>in: Proc. of the LaTeCH, ACL</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Borin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          Hengchen (Eds.), Computational Approaches to Semantic Change,
          <string-name>
            <surname>LSP</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sagi</surname>
          </string-name>
          , S. Kaufmann,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <article-title>Tracing Semantic Change with Latent Semantic Analysis</article-title>
          ,
          <source>Current ethods in historical semantics 73</source>
          (
          <year>2011</year>
          )
          <fpage>161</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <article-title>That's sick dude!: Automatic Identification of Word Sense Change across Diferent Timescales</article-title>
          ,
          <source>arXiv preprint arXiv:1405.4392</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient Estimation of Word Representations in Vector Space</article-title>
          , in: ICLR Workshop Papers,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jawahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora</article-title>
          ,
          <source>in: Proc. of ACL</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>538</fpage>
          -
          <lpage>555</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-I.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hanaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <article-title>Temporal Analysis of Language through Neural Language Models</article-title>
          ,
          <source>arXiv preprint arXiv:1405.3515</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Perozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          , Statistically Significant Detection of Linguistic Change,
          <source>in: Proc. of WWW</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>625</fpage>
          -
          <lpage>635</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          , Omnia Mutantur, Nihil Interit:
          <article-title>Connecting Past with Present by Finding Corresponding Terms Across Time</article-title>
          ,
          <source>in: Proc. of ACL</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>645</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <source>Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change, arXiv preprint arXiv:1605.09096</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arendt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Volkova</surname>
          </string-name>
          ,
          <article-title>Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network</article-title>
          ,
          <source>in: Proc. of ICWSM</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Englhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Willkomm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schäler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Böhm</surname>
          </string-name>
          ,
          <article-title>Improving Semantic Change Analysis by Combining Word Embeddings</article-title>
          and Word Frequencies,
          <source>International Journal on Digital Libraries</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>247</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bamler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mandt</surname>
          </string-name>
          , Dynamic Word Embeddings,
          <source>in: Proc. of the ICML</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>380</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Erk</surname>
          </string-name>
          ,
          <source>Deep Neural Models of Semantic Shift, in: Proc. of the NAACL-HLT, ACL</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Dynamic Word Embeddings for Evolving Semantic Discovery</article-title>
          ,
          <source>in: Proc. of the WSDM, ACM</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>673</fpage>
          -
          <lpage>681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , Time-Out: Temporal
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>