<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CLiC-it</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Is Change the Only Constant? An Inquiry Into Diachronic Semantic Shifts in Italian and Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Melis</string-name>
          <email>matteo.melis@studenti.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Salova</string-name>
          <email>anastasiia.salova@studenti.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Zamparelli</string-name>
          <email>roberto.zamparelli@unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Mind/Brain Sciences, University of Trento</institution>
          ,
          <addr-line>Rovereto</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>9</volume>
      <abstract>
        <p>An increasingly prevalent approach to studying the gradual change of word meanings over time involves using distributional semantics, which is based on neighboring words. This study combines methods from Hamilton et al. (2016) [1] and Uban et al. (2019) [2] to analyze deceptive cognate pairs in historical and contemporary Italian and Spanish corpora. By employing fastText word embeddings and various similarity measures, it aims to investigate the change of word meanings and test two laws of regularity proposed by Hamilton et al. (2016) [1], along with a new hypothesized regularity in language change regarding analogy. The findings show a coherent evolution of deceptive cognates across the two languages. However, no meaningful correlation is found regarding the two aforementioned laws. Nevertheless, the results of the hypothesized regularity ofer valuable insight into how the context of word usage shifts along with the word.</p>
      </abstract>
      <kwd-group>
        <kwd>Diachronic semantics</kwd>
        <kwd>semantic shifts</kwd>
        <kwd>distributional semantics</kwd>
        <kwd>similarity measures</kwd>
        <kwd>deceptive cognates</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <sec id="sec-2-1">
        <title>1.1. Background</title>
        <sec id="sec-2-1-1">
          <title>In recent years, there has been a growing interest in</title>
          <p>
            studying the shift of word meanings over time, with word
embeddings emerging as a valuable tool for this purpose.
Hamilton et al. (2016) [
            <xref ref-type="bibr" rid="ref32">1</xref>
            ] conducted research focusing
on diachronic word embeddings to uncover specific
statistical laws associated with semantic change. They
examined the law of conformity, which suggests that words
tend to change inversely to their frequency. Additionally,
they explored the law of innovation, which proposes that
words with greater polysemy tend to undergo semantic
changes more frequently, regardless of how often they
are used. The findings confirmed the hypothesized
statistical laws. The study primarily focused on English,
aligning word embeddings from diferent time periods
and measuring semantic similarity using cosine
similarity.
          </p>
          <p>Dubossarsky et al. (2017) [3] contested the validity
of the reported laws of semantic change based on word
representation models. Replicating previous studies, they
found that the law of conformity and the law of
innovation did not withstand the more rigorous standard. The
negative correlation between word frequency and
meaning change was weaker than previously claimed, and
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2023 Copyright for this paper by its authors. Use permitted under Creative</p>
          <p>CEUR</p>
          <p>Workshop Proceedings (CEUR-WS.org)
a shared etymon.
the positive correlation between polysemy and
meaning change was largely dependent on word frequency
without independent contribution.</p>
          <p>
            Similarly, to Hamilton et al. (2016) [
            <xref ref-type="bibr" rid="ref32">1</xref>
            ], Uban et al.
(2019) [2] investigated semantic divergence across
languages by examining deceptive cognate sets, which are
words with a common origin in diferent languages. They
focused on analyzing modern embeddings to quantify
semantic shifts originating from shared etymology,
identify false friends (deceptive cognates) in the cognate sets,
and measure their score of falseness, namely the
dissimilarity between the cognates. The study primarily
concentrated on six Romance languages. The authors
introduced methodologies such as aligning word
embeddings across languages, measuring semantic similarity
and divergence between cognate sets, and quantifying
the magnitude of semantic changes. Their findings
contradict those of Hamilton et al. (2016) [
            <xref ref-type="bibr" rid="ref32">1</xref>
            ], who found
a negative correlation between frequency and meaning
shift. However, they align with their findings regarding
the law of innovation.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>1.2. Objectives</title>
        <p>to draw conclusions about the minimum amount of data
needed for these analyses.</p>
        <sec id="sec-2-2-1">
          <title>The primary focus of this study is to investigate the pres</title>
          <p>ence of statistical laws governing semantic shifts within
the Romance language group, specifically Italian and 2.1.1. Italian
Spanish. The research questions revolve around explor- Four corpora were collected online for this study:
Histing the laws of conformity and innovation. It is hypothe- corp [4], ChroniclItaly v3.0 [5], Unità corpus [6], and
sized that more frequent words are less likely to undergo PAISÀ corpus [7]. The first three corpora were merged to
semantic shifts, while more polysemous words are more form the historical dataset, covering the years 1805-1969,
prone to such changes. Additionally, the study intro- with a total of 545,068,401 tokens. The PAISÀ corpus
duces a new follow-up analysis on analogy, suggesting represented the modern data, containing 1,089,014,748
that over time periods the meaning of a word which is tokens, while the reduced modern version consisted of
semantically related to a target (in terms of context-based 545,106,781 tokens.
nearest neighbors), tends to shift in the Euclidean space
coherently with the target word. 2.1.2. Spanish</p>
          <p>
            The study uses distributional semantics as a
methodology to explore language change. A crucial part of this re- Similarly, four corpora were collected online for
Spansearch involves analyzing deceptive cognate pairs, which ish: Conha19 [8], Impact-es (BVC section) [9], Corpus of
have a similar or the same form in diferent languages Political Speeches [
            <xref ref-type="bibr" rid="ref33 ref9">10</xref>
            ], and The Large Spanish Corpus
but diverged in meaning over time, unlike true cognates [
            <xref ref-type="bibr" rid="ref2">11</xref>
            ]. The historical data consists of a merged collection
that retain the same meaning. For instance, Figure 1 of the first three corpora, covering the period from 1830
illustrates how largo (broad) in Italian and largo (long) to 1969 and containing 204,904,549 tokens. The modern
in Spanish have diverged in meaning through a seman- data representation utilizes ’The Large Spanish Corpus’
tic shift, despite both words originating from the shared (Wikipedia section), containing 975,251,278 tokens from
Latin etymon largo (abundant). We believe this allows 2019. Additionally, a reduced version of The Large
Spanfor a robust comparison of semantic changes, especially ish Corpus was created, containing 206,900,109 tokens.
in related languages, providing illustrative examples and
easily interpretable results. Our primary focus is on sys- 2.2. Pre-processing Techniques
tematic semantic change that originates from the shared
etymon and continues, while also controlling for the ran- The pre-processing for both languages followed the same
dom appearance of lexical units in a language. Moreover, steps. After collecting the text files for each corpus, we
this approach would enable cross-language analysis in used the NLTK library [12] for tokenization and
stopprospective studies. word removal. The files were cleaned by removing URLs,
          </p>
          <p>Our study aims to expand the current understanding numbers, non-letters, multiple empty spaces, and set to
of language change by incorporating cognate compar- lowercase. For Spanish, diacritic marks were replaced
usisons across languages and examining individual changes ing unicodedata. The spaCy library [13], with its reported
within specific time periods. To enhance the robustness accuracy of 0.96 for Spanish and 0.97 for Italian, was
emof our analyses, we introduce various similarity mea- ployed for lemmatization, and the files were merged into
sures. a representative single file for each historical period and
language.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Corpora</title>
      <sec id="sec-3-1">
        <title>2.1. Corpora Selection Criteria</title>
        <sec id="sec-3-1-1">
          <title>The study uses two diferent time periods of language usage in its corpora: the 19th and 20th centuries (until 1969) for historical data, and the 21st century for modern data.</title>
          <p>To address the size diference between the two datasets,
we reduced the modern data to match the historical data’s
size. This was achieved by counting the number of
required tokens and removing the tokens exceeding this
number. This allowed for two diferent training sets for
the modern data, enabling comparisons and allowing us</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.3. Cognate Dataset</title>
        <sec id="sec-3-2-1">
          <title>We used an existing resource: an automatically generated</title>
          <p>multilingual lexicon of false friends [14]. Following the
logic that cognate pairs are considered false friends if
a word in the second language is closer in meaning to
the original word in the shared semantic space than its
cognate in that language, a falseness score is provided.</p>
          <p>For instance, given the cognate pair (imbarazzata,
embarazada), where imbarazzata (embarassed) is a word in
Italian and embarazada (pregnant) is a word in Spanish,
if there is a word x in Spanish such that for any word
w in Spanish the distance (imbarazzata, x) is less than
the distance (imbarazzata, w), then the pair is considered
a deceptive cognates pair. Since the Spanish word aver- 3.3. K-Nearest Neighbors Retrieval Using
gonzada (embarassed) exists, the pair (imbarazzata, em- a Similarity Measure
barazada) constitutes a set of false friends, and their
arithmetic diference is the score of falseness, which ranges To obtain more qualitative data, the fastText library [15]
from 0 to 1. It is lower for false friends that are closer in was used to retrieve embeddings closest to the target
meaning and higher for more distant false friends. cognate in Euclidean space. The retrieval process utilized</p>
          <p>Given this, we decided to extract the 156 deceptive the K-Nearest Neighbors (K-NN) function, where the
cognate pairs with a falseness value higher than 0.25. cosine similarity measure was employed to compare two
This step was taken to ensure the accuracy of the dataset vectors. The number of nearest neighbors to retrieve
and account for its limitations in the unsupervised data (k) was predetermined and set to 5, 10, 20, and 50 for
collection method. comparative analysis purposes.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Methodology</title>
      <sec id="sec-4-1">
        <title>3.4. Semantic Shift Calculation within</title>
      </sec>
      <sec id="sec-4-2">
        <title>Each Language</title>
        <p>We trained six fastText models [15] in an unsupervised
regime using the six corpora that we obtained and
prepared. For each model, we employed the skip-gram
algorithm, set the vector dimension to 100, and trained for
5 epochs. These parameters are considered default, and
as indicated by Mikolov et al. (2013) [16], the algorithm
has been found to work well with small datasets. This
resulted in three models for each language, trained on
historical data, modern data, and modern reduced data,
respectively. This produced a total of 6 diferent vector
spaces.</p>
        <sec id="sec-4-2-1">
          <title>Methodologically, the study can be divided into the fol</title>
          <p>lowing steps1:</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.2. Embeddings Overview with RSA</title>
        <sec id="sec-4-3-1">
          <title>In order to obtain a comprehensive overview of the vector</title>
          <p>spaces and as the initial step of our analysis, we computed
Representational Similarity Analysis (RSA) between
dissimilarity matrices of 156 deceptive cognate words from
the dataset by Uban and Dinu (2020) [14]. These matrices
were created by extracting vectors for specific cognates
from the common vector spaces obtained in the previous
step. The aim was to assess general similarity patterns
within the word embeddings. Based on the results thus
obtained we chose to exclusively use the model trained
on the full modern data and discard the one trained on the
reduced modern data to ensure higher-quality word
embeddings in later steps. Detailed results of this analysis
will be discussed later.</p>
          <p>
            1All the code can be found at
https://github.com/matteomls/diachronic-semantic-shift.
1. We applied Procrustes alignment [
            <xref ref-type="bibr" rid="ref16">17</xref>
            ] to the two
vector spaces (historical to modern for each
language) to ensure that similar vectors represented
the same concepts across diferent embedding
spaces. This alignment was necessary as the
embeddings were trained on diferent corpora in
diferent languages.
2. We calculated the cosine similarity for the
cognates in diferent time periods.
3. We counted the occurrences of each cognate word
from both the historical and modern corpora in
Italian and Spanish.
4. We normalized the occurrences of cognate words
by dividing each value by the maximum value,
which is the sum of all values. This normalization
resulted in a total of 1, efectively replacing the
actual frequency values.
          </p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Using the NumPy library [18], we computed the corre</title>
          <p>lation coeficient and linear regression coeficients of the
frequency and semantic shift across time. In this analysis,
we incorporated polysemy covariance, considering the
correlation between polysemy and frequency.
3.6. Word Polysemy and Semantic outcome. Furthermore, when comparing the reduced
Divergence Analysis historical Spanish embedding space with the modern
embedding space, a diference of 0.0956 is observed (b).</p>
          <p>After conducting the frequency and semantic divergence Therefore, while the results for Italian remain consistent
analysis, we proceeded to measure the polysemy of words. between the full and reduced spaces, reducing the
SpanTo accomplish this, we utilized the WordNet library [19], ish modern space to match the historical space produces
specifically leveraging the functionality provided by the diferent outcomes compared to using the full modern
”nltk.corpus.wordnet” module. Polysemy was quantified space. Given the choice between data quality and
balas the number of synsets associated with a word in Word- ance, we have opted for better data quality by discarding
Net, following the methodology described by Uban et al. the models trained with reduced datasets.
(2019) [2].</p>
          <p>Subsequently, we investigated the correlation between
the cosine similarity over time, which indicates the de- 4.2. Calculation of Semantic Shifts
gree of semantic shifting, and the number of meanings 4.2.1. Within-Language Comparison: K-NN with
a word can have according to WordNet. In this analysis, Jaccard Distance
we took into account the co-variance with frequency,
similarly to our previous approach.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>3.7. Word Analogy and Semantic</title>
      </sec>
      <sec id="sec-4-5">
        <title>Divergence Analysis</title>
        <sec id="sec-4-5-1">
          <title>In addition to the previous analyses, we further examined</title>
          <p>how the cosine similarity changes over time for the
KNearest Neighbors (K-NN) that exhibit overlap between
the two diferent time periods. For each cognate word,
we employed a K-NN approach with varying values of
K (5, 10, 20, 50). We examined the overlapping nearest
neighbors (NN) in both the historical and modern lists of
NN. For each overlapping NN, we calculated the cosine
similarity and measured the diference in the shift,
determining whether the NN moved closer to or further from
the target cognate word.</p>
          <p>By calculating the ratio of positive (closer) or negative
(further) shifts, we could now assess the coherence (the
consistency of neighbors’ movement relative to the target
cognate) of the shift in the K-NN of that specific target
cognate word. To identify significant coherent shifts, we
set a threshold (&gt;0.75). This threshold was chosen to be
substantially higher than chance, ensuring a rigorous
approach. If this ratio is crossed, it implies a major coherent
shift in the K-NN of the target cognate word.</p>
          <p>In carrying out this analysis for all the cognates in the
list we removed those that had 0 or 1 NN, since they do
not provide informative results.</p>
        </sec>
        <sec id="sec-4-5-2">
          <title>In reference to the selection of K Nearest Neighbors</title>
          <p>(KNN) values at 5, 10, 20, and 50, the obtained results are
presented in the tables provided in the Appendices B and
C (Tables 3 to 10). These tables display the average
number of overlapping nearest neighbors in the cognate list,
the ratio of overlapping nearest neighbors considering
the extracted KNN, and the Jaccard distance. Please refer
to the Appendix for a detailed representation of these
values.
4.2.2. Inter-Language Comparison: K-NN with</p>
          <p>Jaccard Distance</p>
        </sec>
        <sec id="sec-4-5-3">
          <title>The values in Appendix D (Tables 11 and 12) represent</title>
          <p>dissimilarity scores, specifically semantic shifts,
calculated using the Jaccard distance (1-Jaccard index). The
Pearson correlation score of 0.999 indicates a strong
correlation between the shifts for Italian and Spanish as the
particular K value increases. Overall, the scores show
compatible semantic shifts. However, in this analysis, we
can only infer the magnitude of the shifts and not the
patterns, which will be explored in later analyses.</p>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>4.3. Law of Conformity</title>
      </sec>
      <sec id="sec-4-7">
        <title>4.4. Law of Innovation</title>
        <p>
          Conversely, in our study the results for the law of
innovation (more polysemy = greater shift), depicted in Figure
2 (lower), difer from those reported by Hamilton et al.
(2016) [
          <xref ref-type="bibr" rid="ref32">1</xref>
          ] and Uban et al. (2019) [2]. While we observed a
moderate positive trend, similar to that of the law of con- Table 1
formity, with correlation scores of 0.401 for Italian and Analogy analysis for Italian
0.417 for Spanish, the partial correlation, which accounts K-NN
for the frequency compound, reveals weaker values of 5
0.249 for Italian and 0.188 for Spanish. These findings 10
suggest that the data does not provide strong support 20
for the existence of the law of innovation in Romance 50
languages. However, due to the weak partial correlations
observed, it is challenging to draw definitive conclusions.
        </p>
        <p>N° of Cognates
53
83
104
121
and their target cognate increase, leading to less
consistent shifts. To provide a visual representation, Figure 3
displays an example visualization for a single cognate
pair.</p>
      </sec>
      <sec id="sec-4-8">
        <title>4.5. Law of Analogy</title>
        <p>One trend that emerges from our study is that
semantically related words (as indicated by contextual nearest
neighbors) tend to shift coherently closer or farther to
the target word. Table 1 and Table 2 provide supporting
evidence for this observation: as the number of nearest
neighbors (K-NNs) increases, the ratio of coherent shifts
tends to decrease. This aligns with the intuition that
with more K-NNs, the distances between the neighbors</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>them using partial correlation.</p>
      <p>Utilizing the fastText model, known for its
imThe hypothesized regularity regarding analogy, a follow- proved performance on non-English languages, and
preup analysis in this study, has provided intriguing insights processing freely available data, the results still highlight
into semantic shifts. However, it is important to note that poor quality embeddings. This underscores the need for
further research into this topic is necessary to validate ongoing research and development of word embedding
and expand upon these initial findings. models, alongside the creation of larger, well-curated
di</p>
      <p>
        On the other hand, the analyses conducted in this study achronic corpora. Improving data quality and quantity
do not yield definitive results supporting the statistical can enhance the accuracy and reliability of future studies
laws of semantic shifts. Firstly, the RSA evaluation of in the field.
the embedding spaces revealed that the scarcity of data It is important to note that due to the limitations of the
significantly impacted the quality of the embeddings. embeddings used in this study, the shifts observed in the
Furthermore, while the law of conformity agrees with inter-language Jaccard distance analysis are relatively
previous literature in a general trend, such as Hamilton small and close to each other. This leads to an extremely
et al. (2016) [
        <xref ref-type="bibr" rid="ref32">1</xref>
        ], our study identified a contrasting trend high correlation coeficient between the languages being
for the law of innovation. This discrepancy in findings analyzed, which should be interpreted with caution.
may be attributed to the limitation of our study, namely In addition to the aforementioned directions, other
the scarcity of data resulting from the use of relatively potential areas of research include expanding further in
short time periods. time and broader in the scope of languages. For instance,
      </p>
      <p>An additional factor is the relatively short temporal this could involve going beyond the Romance or even
distance between the historic (as recent as 1969) and the Indo-European language family to conduct a more
the modern corpora. Increasing this span is likely to comprehensive investigation into language change.
lead to greater shifts, but also to greater data sparsity.</p>
      <p>Last but not least, the alignment technique employed for
matching the embedding spaces could have contributed Acknowledgements
to the divergent outcomes in the analysis of the law of
conformity and the law of innovation. We would like to express our gratitude to Dr. Rafaella</p>
      <p>It is noteworthy that both the laws of conformity and Bernardi for her support and feedback throughout this
the law of innovation conform to the findings of Du- project, which has been helpful in shaping our research.
bossarsky et al, (2017) [3]. Their study revealed that the We also appreciate her encouragement regarding the
suggested positive correlation between meaning change conference submission.
and polysemy was primarily influenced by word fre- We also extend our gratitude to Dr. Lorella Viola for
quency, and the correlation between word frequency her generous assistance in providing a portion of the
and meaning change is indeed weaker. Here, after con- corpus used in our analysis.
ducting partial correlation analysis, a weak correlation
was observed. Furthermore, we noticed a high
compatibility between frequency and polysemy, indicating an
inherent dependence, despite our eforts to disentangle</p>
    </sec>
    <sec id="sec-6">
      <title>A. RSA Correlation of Italian and Spanish</title>
      <p>8 cognates not found
30 cognates not found
N° of overlap
0.88015264
0.8567517
0.8563638</p>
      <p>...
0.3236405
0.3200544
0.18371347</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          https://aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>116</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cañete</surname>
          </string-name>
          , Compilation of large spanish unanno[1]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          , Diachronic tated corpora,
          <source>Zenodo</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>word embeddings reveal statistical laws of semantic</article-title>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper, Natural language process-
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>change, in: Proceedings of the 54th Annual Meet- ing with Python: analyzing text with the natural</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>ing of the Association for Computational Linguis- language toolkit,</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>tics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Association for Com- [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Montani</surname>
          </string-name>
          , spaCy 2: Natural language
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>putational Linguistics</source>
          , Berlin, Germany,
          <year>2016</year>
          , pp.
          <article-title>understanding with Bloom embeddings</article-title>
          , convolu-
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          1489-
          <fpage>1501</fpage>
          . URL: https://aclanthology.org/P16-1141.
          <article-title>tional neural networks and incremental parsing,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>doi:10</source>
          .18653/v1/
          <fpage>P16</fpage>
          -
          <lpage>1141</lpage>
          .
          <year>2017</year>
          . [2]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Uban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Ciobanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          , Studying [14]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Uban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          , Automatically building a
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <year>2019</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>166</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          - 4720. guage Resources and Evaluation,
          <year>2020</year>
          . URL: https: [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weinshall</surname>
          </string-name>
          , E. Grossman, Outta //api.semanticscholar.org/CorpusID:218973843.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>control: Laws of semantic change</article-title>
          and inher- [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov, En-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>Proceedings of the 2017 Conference on Empiri- Transactions of the Association for Computational</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>cal Methods in Natural Language Processing, As- Linguistics</source>
          <volume>5</volume>
          (
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>sociation for Computational Linguistics</article-title>
          , Copen- [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          , Eficient
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>hagen</surname>
          </string-name>
          , Denmark,
          <year>2017</year>
          . URL: https://aclanthology. estimation
          <article-title>of word representations in vector space,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>org/D17-1118</source>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          - 1118. in: International Conference on Learning Represen[4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pettersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Megyesi</surname>
          </string-name>
          , The histcorp collec- tations,
          <year>2013</year>
          . URL: https://api.semanticscholar.org/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>tion of historical corpora and resources</article-title>
          , in: Dig- CorpusID:
          <fpage>5959482</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>ital Humanities in the Nordic Countries Confer-</article-title>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gower</surname>
          </string-name>
          , Generalized procrustes analy-
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>ence</surname>
          </string-name>
          ,
          <year>2018</year>
          . URL: https://api.semanticscholar.org/ sis, Psychometrika
          <volume>40</volume>
          (
          <year>1975</year>
          )
          <fpage>33</fpage>
          -
          <lpage>51</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>CorpusID:19243754</article-title>
          . https://EconPapers.repec.org/RePEc:spr:psycho:v: [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Viola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Fiscarelli</surname>
          </string-name>
          , Chroniclitaly
          <volume>3</volume>
          .0.
          <string-name>
            <surname>a</surname>
          </string-name>
          deep-
          <volume>40</volume>
          :y:1975:i:1:p:
          <fpage>33</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>learning, contextually enriched digital heritage col-</article-title>
          [18]
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Millman</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van der Walt</surname>
          </string-name>
          , R. Gom-
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>in the usa 1898-1936</source>
          , in: Proceedings of the Con- lor, S. Berg,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Picus</surname>
          </string-name>
          , S. Hoyer,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>ference</surname>
          </string-name>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.4596345.
          <string-name>
            <surname>M. H. van Kerkwijk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Brett</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Haldane</surname>
            , J. Fer[6]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Caselli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cassotti</surname>
          </string-name>
          , R. Var- nández
          <string-name>
            <surname>del Río</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wiebe</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Peterson</surname>
          </string-name>
          , P. Gérard-
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>in:</surname>
          </string-name>
          CLiC-it 2020 Italian Conference on Computa
          <string-name>
            <surname>- H. Abbasi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gohlke</surname>
            ,
            <given-names>T. E.</given-names>
          </string-name>
          <string-name>
            <surname>Oliphant</surname>
          </string-name>
          , Array pro-
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>tional Linguistics</source>
          <year>2020</year>
          , volume
          <volume>2769</volume>
          ,
          <article-title>CEUR Work- gramming with NumPy</article-title>
          ,
          <source>Nature</source>
          <volume>585</volume>
          (
          <year>2020</year>
          )
          <fpage>357</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>shop Proceedings (CEUR-WS.org)</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1038/s41586- 020- 2649- 2. [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lyding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stemle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Borghetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brunello</surname>
          </string-name>
          , [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          , WordNet: An Electronic Lexical
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Castagnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dittmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          , Database, Bradford Books,
          <year>1998</year>
          . URL: https://
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          ,
          <source>PAISÀ corpus of italian web text</source>
          ,
          <year>2013</year>
          . mitpress.mit.edu/9780262561167/.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          URL: http://hdl.handle.
          <source>net/20.500</source>
          .
          <issue>12124</issue>
          /3, eurac Re-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          search CLARIN Centre. [8]
          <string-name>
            <given-names>U.</given-names>
            <surname>Henny-Krahmer</surname>
          </string-name>
          , Corpus de novelas his-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>panoamericanas del siglo xix (conha19) version</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          1.0.1, in: Proceedings of the Conference,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>doi:10</source>
          .5281/zenodo.4781947. [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sánchez-Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Martínez-Sempere</surname>
          </string-name>
          , X. Ivars-
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <source>uation 47</source>
          (
          <year>2013</year>
          )
          <fpage>1327</fpage>
          -
          <lpage>1342</lpage>
          . [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Álvarez-Mellado</surname>
          </string-name>
          ,
          <article-title>A corpus of Spanish political</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <article-title>speeches from 1937 to 2019</article-title>
          , in: Proceedings of
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>ciation</surname>
          </string-name>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>928</fpage>
          -
          <lpage>932</lpage>
          . URL: Word Querer Decir Pueblo
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>