<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Computational Humanities Research Conference, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adjusting Scope: A Computational Approach to Case-Driven Research on Semantic Change</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lauren Fonteyn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrique Manjavacas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leiden University Centre for Linguistics, Department of English Language and Culture</institution>
          ,
          <addr-line>Arsenaalstraat 1, 2311CT, Leiden</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Computational studies of semantic change are often wide in scope, aiming to capture and quantify semantic change in language at large in a data-driven, 'hands-of' way. Case-driven, corpus-linguistic studies of semantic change, by contrast, generally aim to tackle questions about the development of specific linguistic phenomena. Due to its narrower scope, case-driven research is more restricted in terms of the data it may employ, and at the same time it requires a more fine-grained description of the targeted linguistic developments. As a result, case-driven studies face particular methodological challenges that are not at play in more wide-scoped approaches. The aim of this paper is to set out a 'hands-of' computational procedure to study specific cases of semantic change. The case we address is the development of the phrasal expression to death from a literal, resultative phrase (e.g. he was beaten to death) into an intensifier (e.g. We were just pleased to death to see her). We deploy hierarchical clustering algorithms over distributed meaning representations in order to capture the evolution of the semantic space of verbs that collocate with to death. We then describe the arising diachronic processes by means of monotonic efects, providing a more accurate picture than customary linear regression models. The methodology we outline may help tackle some common challenges in the use of vector representations to study similar cases of semantic change. We end the discussion by pinpointing (remaining) challenges that case-driven research may encounter.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linguistics</kwd>
        <kwd>Semantic Change</kwd>
        <kwd>Grammaticalization</kwd>
        <kwd>Distributional Semantics</kwd>
        <kwd>Bayesian Modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Over the past decade, computational approaches to semantic change have experienced a surge
in popularity. This is largely due to the rise of an increasingly powerful body of models that
aim to approximate the meaning of words over time by encoding their linguistic context (or
‘distributional properties’) into (diachronic) word embeddings [see, among many others: 47,
19, 35, 18, 1, 44, 46, 29, 26, 13, 51, 11, 48, 5, 50]. A characteristic of many of these studies
is that their research questions are very wide in scope: their aim is not to address questions
about any specific word or construction, but rather to capture and quantify some aspect of
semantic change at large in a data-driven way. As such, these studies tend to approach semantic
change in bulk, with sample sizes ranging from hundreds [e.g. 35, 13, 48] to thousands of
linguistic items [e.g. 18], and with specific examples of semantic change predominantly serving
as (straightforward) illustrations of a more general pattern or trend.</p>
      <p>
        Yet, vector-based models are also used in more narrow-scoped, case-driven research. In this
type of research, which is perhaps most common in (corpus) linguistics, the aim is to approach
a specific case study of semantic change in a largely automated, data-driven manner. The
motivation for doing so is that introspective data annotation (which is not only labour-intensive,
but potentially problematically subjective [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ]) is avoided or minimized. Additionally, the use
of vector-based models allows researchers to operationalize theoretical concepts in quantifiable
terms in order to verify or falsify hypotheses on the nature and causes of semantic change in
the case under scrutiny [e.g. 21, 10, 41, 40].
      </p>
      <p>Despite the fact that the two approaches have an obvious common ground, case-driven
investigations are clearly distinct from wide-scope computational studies in a number of ways.
Most importantly, case-driven research generally emerges from a desire to tackle questions
about the development of specific linguistic phenomenon (often during a specific time window).
Consequently, compared to the wide-scope computational research into semantic change,
casedriven research is relatively inflexible in terms of the data it may employ to attain its goals.
Furthermore, the very reason why the researcher is compelled to undertake case-driven research
is that the specific phenomenon under scrutiny constitutes a complex challenge. As such, while
it is not uncommon for case-driven studies to use computational models and metrics proposed
in wide-scoped studies on semantic change, the specific cases they scrutinize may present
methodological challenges that are either not at play, or glossed over in, the studies they draw
from.</p>
      <p>
        The aim of the present contribution is to tackle a case-driven study by means of
computational methods. In doing so, our work ties in with earlier explorative work aimed at pinpointing
where challenges may lie for case-driven research [e.g. 49]. The specific case of semantic change
we address is the historical development of the phrasal expression to death from a literal,
resultative phrase (e.g. he was beaten to death) into an intensifier (e.g. We were bored/pleased
to death) [
        <xref ref-type="bibr" rid="ref23 ref33">23, 33</xref>
        ].
1.1. Aims
More specifically, we aim to delineate a step-wise procedure that:
1. minimizes manual work, so that it is more feasible for case-driven research to maximally
exploit available data and maximize the data-driven character of case-driven research;
2. flags and discusses remaining pitfalls and challenges future case-driven work may
encounter.
      </p>
      <sec id="sec-1-1">
        <title>1.2. Outline</title>
        <p>To analyse the development of to death with minimal manual interference, we suggest a
procedure consisting of the following steps:
1. Surveying work on the linguistic construction (and related cases) under scrutiny to (i)
delineate a time window, and (ii) formulate hypotheses or expectations that can be
verified by means of computational methods (Section 2);
2. Compiling (and curating) a sufficiently large diachronic corpus collection (Section 3),
from which examples of the construction can be sampled (Section 3.1);
3. Computing and evaluating distributed meaning representations (Section 3.2);
4. Conducting a diachronic cluster analysis, in which we optimize the number of clusters
across time for silhouette score in order to trace changes in to death’s contextual
distribution (Section 4.1);
5. Conducting a sentiment analysis to capture to death’s decreasing negativity (Section 4.2);
6. Assessing the output of the statistical model against the formulated expectations (Section
5).</p>
        <p>After describing the procedure and results, we highlight and discuss the following remaining
pitfalls (Section 6):
1. Because case-driven research aims to examine a fixed (set of) linguistic construction(s)
in a specific time window, researchers may run into issues of data sparsity and balance
that may be difficult to circumvent.
2. The extent to which a reliable, completely automated, ‘hands-of’ approach is possible
and extendable to new cases not yet analysed remains an open question. While we
are optimistic about incorporating computational models into the study of case-driven
semantic change, manual interference may still be desirable or even required.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The methods adopted in the present study are similar to those used in large-scale computational
studies of semantic change. Previous work has suggested various ways of improving the models
that generate (diachronic) word embeddings [e.g. 46, 44], determining (predictive) laws of
(lexical) semantic change at large [e.g. 19, 14], and developing statistical measures that help
detect diferent types of semantic change (e.g. specification vs. broadening; cultural change
vs. linguistic change) in a data-driven manner [e.g. 47, 35, 18, 11, 13, 48, 16]. In other work,
computational models are applied to map changes in specific word classes (e.g. intensifiers;
[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], or (groups of) concepts in particular lexical domains (e.g. ‘racism’, ‘knowledge’; [
        <xref ref-type="bibr" rid="ref2 ref49">49, 2</xref>
        ])
or registers (e.g. ‘scientific language’; [
        <xref ref-type="bibr" rid="ref5 ref50">5, 50</xref>
        ]).
      </p>
      <p>
        In terms of its focus, aims, scope and granularity, this study is reminiscent of research
in corpus linguistics and construction grammar, where a single case of linguistic change is
considered. The development of to death from a phrase that expresses the result of an action
(e.g. He was beaten/stabbed/shot to death) to an intensifying or ‘amplifying’ expression (e.g.
We were thrilled/pleased/shocked to death to see you; [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ]) has been described as a process of
grammaticalization, which took place over the course of the Early and Late Modern English
period (ca. 1500 - Present). As explained by Margerie [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], this grammaticalization process,
in which to death developed a less literal and more ‘grammatical’ reading of amplification,
crucially involved ‘host-class expansion’ [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. More specifically, the development can be broken
down into three stages.
      </p>
      <p>STAGE 1 Initially, to death functioned as adverbial complement of verbs expressing physical harm,
which may result in death (e.g. beat, bleed, burn, etc.).</p>
      <p>
        STAGE 2 Over the course of the 16th and 17th century, to death sporadically started occurring in
contexts where a literal, death-resulting reading is ruled out (e.g. That book bored me to
death). It was not until the 18th century, however, that to death was frequently used in
such non-literal, intensifying cases [
        <xref ref-type="bibr" rid="ref23 ref33">33, 23</xref>
        ]. Notably, as is common in intermediate stages
of grammaticalization [
        <xref ref-type="bibr" rid="ref25 ref30">25, 30</xref>
        ], to death still retained some of its original meaning of a
‘negative end result’ [33, p. 129]. At this stage, the vast majority of its collocate verbs
have negative connotations (e.g. bore, scare, worry).
      </p>
      <p>
        STAGE 3 Despite its persistent preferences for negative situations, to death started to expand
further [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. In the 19th and 20th century, to death began to combine with more positively
oriented verbs (e.g. amuse, love, thrill).
      </p>
      <p>
        The expansion process spawned by the grammaticalization of to death would seem to lend
itself well to computational analysis. A template for the general research design can be found in
the work of Perek [
        <xref ref-type="bibr" rid="ref40 ref41">41, 40</xref>
        ]. With an eye on quantifying processes related to host class expansion,
Perek relies on semantic vector representations of the verb types occurring in the constructions
open verb slot of the hell-construction (e.g. [beat/scare/hug] the hell out of someone) and the
way-construction (e.g. [swim/beat/smile] one’s way to something), employing cluster density
measures in order to quantify the diachronic process.
      </p>
      <p>Crucially, Perek demonstrates that, from a linguistic perspective, it is important to
approach processes of host class expansion in a way that distinguishes changes in lexical diversity
(measured by the number of unique lexical items that occur in a construction) from
semantic diversity (measured by the semantic similarity between those lexical items). This is also
relevant for the study of to death, because changes in lexical diversity alone may not be
indicative of linguistic change, but of cultural change. It may be the case, for instance, that
diferent modes of execution have become prevalent or obsolete, or that the specificity and
lexical diversity with which causes of death are described may increase or decrease as the topic
becomes more or less taboo. In these scenarios, the set of lexical items to death collocates
with may indeed shrink or expand, but the semantics of the phrase do remain stable. At the
same time, such cultural change may happen alongside the grammaticalization of to death into
an intensifier. Thus, the reality of case-driven research may be that the distinction between
cultural and linguistic change is not a matter of ’either/or’ [18], but of ’and’.</p>
      <p>
        The distributed meaning representations that are fed into the clustering algorithm by Perek
[
        <xref ref-type="bibr" rid="ref40 ref41">41, 40</xref>
        ] do, however, fail to distinguish synonymy (e.g. hate &amp; despise) from antonymy (e.g.
hate &amp; love). Hence, they will not capture the the final stage of expansion of to death and
other intensifying constructions, which commonly involves an erosion of its original negative
(or positive) polarity [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>For the purposes of the present study, we gathered a collection of diachronic English corpora,
spanning the period from 1550 to 1949. These corpora include Early English Books Online
(EEBO), the Corpus of Late Modern English Texts (version 3.1; CLMET3.1), the Evans Early
American Imprints Collection (EVANS), Eighteenth Century Collections Online (ECCO), the
Corpus of Historical American English (COHA), and the Hansard corpus (Hansard). In terms
of text types, these corpora are varied, covering an array of literary works, religious and legal
text and news reports. The sole exception is Hansard, which ofers transcriptions of British
parliamentary debates (starting in 1800).</p>
      <p>
        All corpora were submitted to the following pre-processing pipeline. First, we applied a
language identification module in order to sort out foreign text. We relied on two language
identification modules – Google’s Compact Language Identifier (v3) 1 and FastText Language
1Our code repository is accessible through the following url: https://github.com/google/cld3/releases/tag/
3.0.13.
Identification system [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] – which we combined to maximize the retrieval precision of the
foreign text. For a given fragment of 500 characters, we flagged the text as foreign if both
systems indicated a language other than English as the highest probability language. Manual
inspection of a random sampled indicated a sufficiently low false positive rate in order for the
ifltering to be efective (while throwing out an insignificant amount of English text).
      </p>
      <p>
        Second, we tokenized and sentence-tokenized the remaining text using the Punkt tokenizers
provided by the NLTK package [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. After tokenization, we enriched all text with part-of-speech
tags, using an in-house tagger for historical English. The tagger was trained on the PCEEME
[
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] – a corpus of letters from 1410 to 1695 that amounts to about 2.2M labelled tokens – using
a Neural Conditional Random Field (CRF) tagger implemented with PIE [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], and obtained
an overall test-set accuracy above 96%.
      </p>
      <p>The resulting patchwork corpus consists of a total of 3.9B tokens, which we utilized in various
ways in subsequent steps of the research process.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset: to death</title>
        <p>The attestations of to death were retrieved from the corpus collection (excepting the specialized
Hansard corpus). As is common in linguistic research, the data was divided into fixed-width
bins. Each bin represents a 50-year period, which results in a total of 8 bins. As not all corpora
in the collection are balanced in terms of the amount of text a single author may contribute,
we applied an additional sampling step to ensure that no author dominated more than 25% of
the instances in a particular bin.</p>
        <p>The total number of instances retrieved from each corpus per bin is listed in Table 1. In the
bin covering the period between 1700 and 1749, the total corpus size (and hence, the token
frequency of to death) was substantially lower than for other bins. To ensure that any observed
diferences in the number of verb types that collocate with to death across bins is not afected
by large diferences in sample size, we decided to cap the maximum number of tokens sampled
per bin at 800.</p>
        <p>After removing any duplicates, we identified the verb that collocates with each instance of
to death by relying on part-of-speech tags. Each instance of to death was assumed to collocate
with the verb in closest proximity (using a window of 15 words). In a number of cases, the
tagger failed to find a collocate verb. These cases included instances where the copula be was
used in combination with an adjective (e.g. be frozen/sick to death), which were subsequently
corrected and included in the dataset. Cases where to death functioned as a prepositional
modifier of a noun (e.g. on her way to death), fixed expressions (e.g. from birth to death, be
nigh to death), and cases where the verb was illegible (e.g. And when my mother euen before my
sighte, Was (-) to death; 1550, EEBO) were discarded. In total, 109 examples were discarded.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Word Embeddings</title>
        <p>
          In order to capture semantic similarity between to death’s collocate verbs across time, we rely
on distributed meaning representations computed by the word2vec algorithm [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. We use the
entire corpus collection introduced in Section 3. Besides the pre-processing pipeline outlined in
Section 3, we applied the following additional pre-processing steps with the goal of improving
the quality of the resulting embeddings: we lower-cased the corpora, applied NFKD unicode
normalization, removed non-alphanumeric tokens, replaced numbers by a code (e.g. &lt;NUM&gt;),
dropped punctuation, and substituted the long “s” character (ſ) with modern day “s”.
        </p>
        <p>
          We trained distributed representations with a size of 200 using the gensim library [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ]. We
employed the skip-gram objective, approximated with negative sampling and optimized using
a learning rate of 0.025 over 5 epochs, discarding words with frequencies lower than 50 and a
window size of 20 tokens.
        </p>
        <p>
          In order to validate the resulting embedding space, we ran a number of semantic similarity
benchmarks, which allow us to contextualize the quality of our embeddings within the
stateof-the-art. The employed benchmark datasets comprise of sets of Present-day English word
pairs, each of which has been manually assigned a similarity score. The evaluation proceeds
by correlating these human judgments with the cosine similarities between the corresponding
vector representations, using the Spearman correlation coefficient. 2 We compared our
embedding space with (i) 200 dimensions Glove vectors [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] trained on 6B Wikipedia tokens,3 as
well as (i) 300 dimensions word2vec vectors trained on the Google News dataset (about 100B
tokens), restricting the vocabulary of the embedding spaces to the intersection across spaces
and using the average word embedding vector for out-of-vocabulary words.4
        </p>
        <p>As Table 2 shows, our embedding space generates scores comparable to the Glove space,
while lying behind those generated by the word2vec space. Considering that our embedding
space is trained on a smaller dataset and covers a large period of historical English, we take
these results to validate the semantic similarity properties of the inferred word representations.
For a sanity check, Table 3 shows the 20 nearest neighbours of a selection of verbs from our
dataset of to death collocates based on cosine distance.</p>
        <p>2While it is obviously not ideal to evaluate our model with respect to a Present-day English reference point,
no human similarity judgements of this scale are available for historical English. In order to conduct at least
some sort of sanity check, we used the of-the-shelf Present-day English spaces.</p>
        <p>
          3The embeddings are available through the following url: https://nlp.stanford.edu/projects/glove/.
4We use the software package word-embedding-benchmarks [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] in order to streamline the evaluation of the
embedding spaces.
4. Method
stab
        </p>
        <p>whip
strangle (0.59) cudgel (0.69) delude (0.73)
knife (0.59) bludgeon (0.66) flatter (0.63)
bleed (0.58) lash (0.66) perplex (0.61)
slash (0.58) kick (0.59) terrify (0.60)
bang (0.56) cuf (0.57) frighten (0.60)
kill (0.55) spur (0.57) tickle (0.58)
poison (0.55) lfog (0.56) harass (0.54)
bite (0.55) bang (0.55) tire (0.54)
cudgel (0.54) goad (0.55) annoy (0.52)
prick (0.54) scourge (0.54) vex (0.51)
mental verbs
amuse
scare</p>
        <p>vex
frighten (0.78) aflict (0.72)
terrify (0.73) perplex (0.72)
startle (0.67) harass (0.71)
worry (0.55) annoy (0.69)
drive (0.54) oppress (0.69)</p>
        <p>sweep (0.52) fret (0.67)
delude (0.51) grieve (0.64)
astonish (0.51) terrify (0.61)
annoy (0.50) pester (0.60)
amuse (0.50) worry (0.58)
A basic way of quantifying the host class expansion of to death is by examining the change in
diversity in the set of attested collocate verbs over time. One such index of diversity is given
by type frequency – shown in the last row of Table 1. However, while such diversification is
potentially indicative of host class expansion, changes in type frequencies (or lexical diversity)
need not indicate that to death has indeed undergone semantic change; as argued in Section 2,
they may equally be indicative of cultural change. To probe into the host class expansion of
to death, Section 4.1 operationalizes the process as a change in the structure of the semantic
space that the collocate verbs of to death occupy. We rely on hierarchical cluster analysis over
distributed meaning representation in order to not only incorporate a notion of lexical diversity
into the analysis but, crucially, also take ‘semantic diversity’ into account.</p>
        <p>As explained in Section 2, the host class expansion of to death also involved increased
cooccurrence with verbs with progressively more positive connotations. In order to capture this
process, Section 4.2 devises a way to quantify the average polarity of verbs over time using
word embeddings, and statistically describe any existing ‘positivization’ process.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.1. Cluster Analysis</title>
        <p>
          At any given period, we inspect the semantic space delineated by the distribution of attested
verbs using a hierarchical cluster analysis. A known problem with automated cluster analysis
of semantic spaces is that the induced semantic clusters are not always easy to interpret. As a
result, their application in subsequent steps of the research workflow may require manual
finetuning and post-filtering [
          <xref ref-type="bibr" rid="ref40 ref41">41, 40</xref>
          ] to ensure that clusters are meaningful before any measures
of interest can be computed. In contrast, the chosen procedure dispenses with manual
finetuning and inspection of the resulting clusters. First, we identify a clustering metric that
aligns with the expectations of the host-class expansion process. Secondly, we automatically
ifnd the hyper-parameter values that optimize the selected clustering metric, and, finally, we
treat these optimal values as statistical correlates of the host-class expansion process that we
are ultimately interested in describing.
        </p>
        <p>
          As to death develops new, non-literal meanings, we expect the semantic space defined by the
verbs appearing in this construction to expand, with existing clusters of collocates becoming
denser and new clusters representing novel semantic fields starting to form. The silhouette score
[
          <xref ref-type="bibr" rid="ref45">45</xref>
          ] – a common clustering evaluation metric – may help determine whether this expectation
holds up. More specifically, we use the optimal number of clusters based on the silhouette score
as the target statistic for monitoring the process.
        </p>
        <p>For a word wi assigned to cluster Ci, the silhouette score decomposes into the quantities
a(wi) – shown in Equation 1 – and b(wi) – shown in Equation 2. By measuring the average
intra-cluster distance between a word and all other words in the same cluster, a(wi) captures
the tightness of the clusters induced by a clustering algorithm. In contrast, b(wi) measures the
distance to the nearest point in a diferent cluster. The dataset-level aggregated b(wi), thus,
captures the overall separation between clusters.</p>
        <p>a(wi) =</p>
        <p>1 ∑
|Ci| − 1 j∈Ci,i̸=j
b(wi) = min ∑ cosdist(wi, wj )
k̸=i j∈Ck</p>
        <p>cosdist(wi, wj )
s(wi) =</p>
        <p>b(wi) − a(wi)
max(a(wi), b(wi))</p>
        <p>The final silhouette score for a given instance is computed by an aggregation of both
quantities, dividing by a normalizing factor to ensure a constant output range between -1 and 1 –
as shown in Equation 3.</p>
        <p>One risk that can be linked to the presented methodology is that the optimal number of
clusters may increase because the number of unique verb types in the sample has increased (as
shown in Section 3) – i.e. regardless of the semantic composition of the space representing that
bin. Thus, increases in the optimal number of clusters can be due to sampling artifacts – an
issue that becomes even more likely with fat-tailed distributions that are common in linguistic
data. Moreover, even in the absence of sampling artifacts, we must ensure that we are not
simply measuring increases in type frequency-based diversity, which, as already argued, are
not necessarily indicative of linguistic change.</p>
        <p>In order to remedy the afore-mentioned issue, we employ the following bootstrap procedure.
For each period, we sample 500 verbs with replacement from the multinomial distribution
observed in the dataset and compute the optimal number of clusters based on silhouette score.
Repeating this process a 1,000 times per period yields a dataset with 8,000 observations (i.e.
for 8 periods), which we submit to statistical analysis in order to quantify the efect of time on
the optimal number of clusters. Crucially, we record the total number of distinct verbs sampled
in each bootstrap iteration, which allows us to statistically control for the efect of population
size on the obtained optimal number of clusters.</p>
        <p>We rely on hierarchical (agglomerative) clustering using the cosine similarity and complete
linkage,5 and optimize the number of clusters by inspecting the silhouette scores at diferent
nodes in the induced merge tree until reaching the merge step that maximizes the silhouette
score.6</p>
        <p>5We made these choices on the basis of a single manual scan of the interpretability of the clusters induced
from the verbs in the entire dataset.</p>
        <p>
          6We use the reference implementations provided by the Python library scikit-learn [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ].
(1)
(2)
(3)
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>4.2. Sentiment Analysis</title>
        <p>Homing in on the increasing positivity of to death, we leverage the embedding space described
in Section 3.2 in order to capture the sentiment polarity of the sampled verbs. Diferences in
sentiments are not straightforwardly captured by means of hierarchical clustering, as antonyms
are represented by highly similar vectors. In Table 3, for instance, the positive mental verb
amuse, is recognized as being similar to more negative mental verbs like delude and terrify, as
well as its antonyms annoy and vex. The cluster analysis is therefore supplemented by means
of sentiment scores.</p>
        <p>A first approach to induce word-level sentiment scores is to exploit the proximity of a given
verb vector to the vector for the words ‘good’ and ‘bad’. The closer to the vector for ‘good’ the
more positive the sentiment of that verb. However, similar confounding efects from antonyms
make this approach unfeasible. Indeed, in common word embedding spaces the vectors for
‘good’ and ‘bad’ tend to be located in the proximity of each other, and, thus, lack discriminative
power for classifying words with respect to their sentiment.</p>
        <p>
          In order to tackle this issue, post-hoc modifications of the embedding space such as
retroiftting [
          <xref ref-type="bibr" rid="ref20">15</xref>
          ] or word embedding refinement [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ] could allow us to leverage sentiment lexicons in
order to ensure the desired property. In the present work, however, we dispense with the manual
work that such approach would require and resort to a second-order approach that induces
sentiment scores on the basis of the proximity of verbs to a filtered list of nearest neighbors
of ‘good’ and ‘bad’. By manually filtering these lists, we avoid terms that may confound the
polarities, while still keeping the manual work to a small amount. More specifically, we sift
through the vocabulary in ranked order by cosine similarity to “good” and “bad”, and discard
confounding words until reaching a total of 20 words per polarity.7 For a given word wi, we,
then, compute its sentiment score as shown in Equation 4:
        </p>
        <p>S(wi) =
1</p>
        <p>∑
|Ngood| wj∈Ngood
1</p>
        <p>∑
|Nbad| wj∈Nbad
cos(wi, wj ) −
cos(wi, wj )
(4)
where Ngood and Nbad refer, respectively, to the filtered set of nearest neighbours of ‘good’ and
‘bad’.</p>
        <p>To test the efect of time on the polarity of to death’s collocates, we assign each verb in the
dataset to the bin where they are first attested. Given that grammaticalizing structures often
retain their original function, it may well be that the well-established negative use of to death
vastly outnumbers and hence overshadows cases where to death has expanded to intensify new,
more positive verbs. Thus, we suggest that working with the sentiment of collocate verbs that
were first attested in a given bin – rather than the distribution of sentiment in each bin –
captures the ongoing changes more directly and robustly.</p>
      </sec>
      <sec id="sec-3-5">
        <title>4.3. Statistical Modeling</title>
        <p>In order to assess the efect of time on the semantic structure of the attested verbs, as well
as on the overall sentiment, we fit linear regression models regressing the target outcome –
i.e. optimal number of clusters or sentiment score – on the time period. We use a Gaussian
likelihood for both outcomes.</p>
        <p>7These filtered nearest neighbors were checked in order to avoid too specific terms with unstable sentiment
polarity over time. For example, the top 5 neighbours of ‘good’ were ‘better’, ‘excellent’, ‘great’, ‘well’ and
‘best’, while the top 5 neighbours of ‘bad’ were ‘dangerous’, “ill”, ‘inefficient’, ‘wrong’ and ‘hard’.</p>
        <p>
          A further modeling choice we make is to incorporate time period as a monotonic efect –
and not as, for instance, an ordinary linear predictor. This choice is motivated by the fact
that diachronic processes in language structure often result in patterns that resemble s-curves
[
          <xref ref-type="bibr" rid="ref12 ref6">12, 6</xref>
          ]. In these patterns, the magnitude of the predictor varies over time, a fact that cannot
be described by ordinary linear predictors. In contrast, a monotonic predictor shares the
assumption with a linear predictor that the direction of the efect is constant – strictly positive
or negative – while allowing diferences in the efect over adjacent time periods.
        </p>
        <p>
          Our implementation of the monotonic predictor follows Bürkner and Charpentier [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For a
given predictor with n possible categories (in our case, this corresponds to 8 time bins) to be
modelled as a monotonic efect, this approach introduces n-1 ζi parameters such that ζi ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
and ∑in=−11 ζi = 1, keeping ζ0 fixed at 0. For a given observation of the jth time bin, the
monotonic predictor term η is given by Equation 5:
        </p>
        <p>Here, b corresponds to the ordinary linear predictor, representing in this case the direction
and size of the efect on the outcome, and the individual ζi represent the normalized distances
between consecutive predictor categories. The predictor term η is then included in a linear
model in the usual way: y = a + η × x. When fitted, this kind of monotonic predictor can easily
be interpreted by inspecting the values assigned to the ζi parameters, since these correspond
to the relative increase of each category with respect to the total increase involved by the
monotonic predictor.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>
        We deploy a Bayesian regression framework which allows us to inspect the uncertainty in the
statistical parameters of interest in an probabilistic intuitive manner. We fit our models using
the Hamiltonian Monte-Carlo sampler provided by the stan library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] through the R language
package brms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>5.1. Cluster Analysis</title>
        <p>In order to test the monotonicity of the efect, we compare a linear model of the efect of time
period on the optimal number of clusters – LINEAR(P) – with the monotonic efect model –
MONO(P). Moreover, in order to control for the size of the sampled population on the outcome,
we fit additional models including the number of unique verbs in the bootstrap sample as
predictor – LINEAR(P)+S and MONO(P)+S.</p>
        <p>We compare the four models using the Widely Applicable Information Criterion (WAIC),
which estimates the plausibility of the models in terms of both predictive performance and
model complexity (cf. overfitting). The results of the comparison are shown in the top row
of Table 4. Including time period as a monotonic efect improves the predictive power of the
model over the linear efect. Moreover, controlling for sample size is even more important, as
evidenced by the fact that including it results in a larger improvement in WAIC than modeling
period as a monotonic efect.</p>
        <p>Using the most strongly predictive model – i.e. MONO(P)+S – we can visualize the (monotonic)
efect of time period on optimal number of clusters using the posterior predictive distribution.</p>
        <p>Figure 1 depicts the posterior predictive distribution of the optimal number of clusters using a
counter-factual triptych plot, statistically controlling for the sample size at diferent percentiles.
Overall, we observe a clear monotonic efect, resembling an s-curve, with a leap starting in the
1750 bin. The shape of the efect remains stable across the three sample size percentiles. Due
to the positive linear efect of sample size on optimal number of clusters, the range of the
outcome (i.e. the y-axis) increases across plots in the triptych. Moreover, the distribution of
uncertainty varies from plot to plot. At smaller sample sizes, the uncertainty in the predicted
number of clusters is larger towards the later time bins, whereas for larger sample sizes the
most uncertain predictions come from the earlier bins. This is likely due to the fact – depicted
in Figure 2 – that the sample size in pre-1800 bins is always smaller than in post-1800 bins.
However, by counter-factually controlling for sample size, we can observe that the statistical
model predicts a constant efect shape regardless of the sample size.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Sentiment Analysis</title>
        <p>Similarly to the experiments in Section 5.1, we now compare the efect of time period on
sentiment using a linear predictor – LINEAR(P) – and a monotonic efect – MONO(P). We use</p>
        <p>Period &lt; 1800</p>
        <p>FALSE
TRUE
the standardized average sentiment polarity of the verbs as the outcome. The results in terms
of WAIC are shown in the bottom row of Table 4. Modeling time with a monotonic efect
produces an improvement over the linear predictor, although in this case the diference with
respect to the linear efect model is smaller than in the cluster analysis experiments. The left
plot shown in Figure 3 does indicate a slight jump starting in the 1750 bin. However, the
large credible intervals observed do not rule out a merely linear efect. Moreover, as the plot
in the right hand-side of Figure 3 shows, a considerable amount of variance in the dataset is
left unexplained by the model. While statistically controlling for other predictors – such as,
for example, document topic or genre – could improve the fit, the current model does show a
predominantly linear upward efect of moderate size – about 1 standard deviation – of time on
average sentiment.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <p>
        The results of the statistical analyses are in line with expectations in that the optimal number
of verb clusters increases substantially over the course of the 18th century, when the meaning
of to death expanded to non-literal, intensifying uses (STAGE 2). The predicted shift away
from negative polarity (STAGE 3) also appears to be captured by the statistical model, albeit
weakly. Still, as even in present-day English to death is predominantly attested with negative
collocates [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], the weak trend aligns well with the pathway outlined in Section 2. All in all,
then, the procedure adopted here is promising for future case-driven, ‘hands-of’ investigations.
      </p>
      <p>
        With an eye on aiding future applications of the models and methods adopted in the present
study, we highlight some important remaining problems.
6.1. Data sparsity and balance
§While there is no shortage of Historical English corpora, corpora that span all the way from
the Early Modern period up to Present-day English are rare. A notable exception is the suite
of the Penn-Helsinki Corpora [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], which, although wide in scope, is still a corpus collection
very limited in size, and thus also in its use for the ‘data-hungry’ models that are currently
employed in computational studies of semantic change.8 To maximize sample sizes, this study
(following Margerie [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]) resorted to combining large corpora covering diferent time windows.
      </p>
      <p>
        An issue with this patchwork solution is that individual time bins are likely not represented
by a comparable number of texts and text types, which may have consequences at later steps
in the procedure. In the present case, the patchwork corpus sufered from data sparsity in
the 1700-1749 bin, which in turn forced us to cap the maximum number of tokens per bin.
Furthermore, because of the inconsistency with which text types are labelled across the diferent
corpora, it is very difficult if not impossible to smoothly ensure register and genre consistency
across bins. For the present case, such text type inconsistency is indeed very unfortunate: the
time bin in which the host class expansion of to death has taken of also appears to be the time
bin in which the COHA corpus starts, which introduces newspaper and magazine texts into the
sample.9 At the same time, some of the text collections included in the patchwork corpus (such
as ECCO) may contain reprints of older texts, which may have led to an overrepresentation
of older usages in certain time bins. As such, a limitation of the procedure presented here is
that it devotes relatively limited attention to balancing data and/or controlling for text and
text type variation across time bins. A possible solution could be to refrain from working with
corpus patchworks, and turn to the Google Books Corpus (1500 - 2008) or other large library
dumps. Yet, even then issues of overrepresentation (and mislabelling) of texts and text types
may remain [
        <xref ref-type="bibr" rid="ref37 ref52">52, 37</xref>
        ].10
      </p>
      <p>
        In short, a substantial challenge for case-driven research is that the careful data curation it
requires may lead to an impasse. Even when following strict procedural guidelines and sanity
checks [
        <xref ref-type="bibr" rid="ref14 ref49">49, 14</xref>
        ], artefact results are still possible when data is uncurated or poorly balanced (as
also discussed for lexical semantic change in Hengchen et al. [20]). Furthermore, introducing
such balance may not be easy (or even possible), and it may also impact sample size (which
complicates the study of more infrequent phenomena).
      </p>
      <p>
        8A somewhat larger corpus covering a very wide time span is the OED quotation database, estimated at 35M
words [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Besides its modest size (and very few attestations of to death [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]), the OED quotations database is
afected by balancing issues similar to those described for the Google Books Corpus.
      </p>
      <p>9In the 1800 bin, no tokens are included from newspaper texts, and 82 out of 800 tokens (10.25%) were
found in magazine texts.</p>
      <p>
        10Additionally, even diachronic trends in balanced diachronic corpora may in a strict sense also be artefacts,
as genres and registers are also subject to change. With respect to newspaper and magazine text, for instance, it
has been shown that the changing “readerships and purposes of magazines versus newspapers result in diferent
historical-linguistic patterns of use” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <sec id="sec-5-1">
        <title>6.2. Minimizing manual interference</title>
        <p>The discussion of to what extent manual interference is needed or desirable in case-driven
studies of semantic change is far from trivial, and, ultimately still undecided. In the spirit of
the ‘data-drivenness’ discourse in preceding work [e.g. 41, 16], the procedure presented here
aimed to minimize manual filtering and annotation – but such manual interference has not
been entirely absent. In collecting the collocate verbs of to death, for instance, a substantial
number of cases involved structures where the collocate of interest is not the verb be but its
accompanying adjective. Similarly, cases where the verb form in closest proximity to to death
(e.g. we could prevent Scipio from pummel ling the dreaded wizard to death, COHA 1840)
were corrected manually.</p>
        <p>
          Furthermore, there are various points where further ‘manual meddling’ could be considered.
In many instances that were retained in the dataset, to death has neither resultative nor
intensifying meaning.11 Given the limited relevance and potential efects on the output of the
statistical analyses of irrelevant cases, it may be worth flagging or even excluding them from
the dataset, as done in Margerie [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] and Perek [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Yet, such actions do involve elaborate
manual annotation, and potentially introduce annotator judgments into the procedure that
may diminish the ‘data-driven’ character of the study.12
        </p>
        <p>Finally, with respect to the cluster analysis, a fully hands-of approach also implies that
we trust the word embedding space to reflect meaningful semantic parameters, and that the
resulting clusters capture, at least roughly, relevant properties of the underlying process. In
the present case, it is reassuring to see that the narrative that emerges from our data analysis
appears to align with what earlier linguistic research has proposed. However, it is not
guaranteed that the verb clusters that were fed into the statistical analysis correspond with the
semantic verb classes proposed in earlier research (e.g. actions of physical harm vs. mental
verbs), or even with any groupings that are meaningful to humans. Additionally, while the
bootstrapping procedure described in Section 4.1 renders the procedure more robust, it also
makes it more difficult to examine which verbs constitute what cluster at which points in
time. Given this lack of full transparency, the (as of yet unanswered) question becomes how
to progress towards a method that reliably and robustly supports exploratory data analysis in
cases not yet analyzed [also see 49, 20], and to what extent limiting manual involvement to an
absolute minimum is warranted in specific case-driven studies.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion and Future Outlook</title>
      <p>Drawing on the vast (and growing) body of computational research on semantic change, this
study examined how computational models can be employed to track the host class expansion
of grammaticalizing constructions, such as to death. By adjusting our scope to one specific and
11For instance, positive verb collocates such as love are attested earlier than expected in examples such as
He swore he wou’d love me to death (EEBO, 1700), where to death most likely functions as a time adverbial
(‘he swore he’d love me until death’) and not as a resultative (‘he swore me he’d love me resulting in death’)
or an intensifier (‘he swore he’d love me a lot’). These structures could, of course, have contributed to to
death’s acquisition of intensifying meaning (loving someone until death implies loving them a lot), and hence
be relevant to include. In other cases, however, the relevance of the query hit in relation to the semantic
development described appears to be much less clear (e.g. the first who turns his back to death ; EEBO, 1800).</p>
      <p>12A possible, yet costly solution here would be to rely on multiple annotators, preferably with expertise in
the historical language variety at hand [20].
relatively complex case, the procedure we outlined caters to case-driven research, which
operates at a level of specificity and granularity that is not abundantly common in computational
approaches to semantic change. Besides outlining the procedure, we flagged its current
limitations and issues, which will hopefully entice further case-driven computational humanities
research that will help reflect on and ultimately tackle the challenges that remain.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The work for this study has been made possible by the Platform Digital Infrastructure (Social
Sciences and Humanities) fund (PDI-SSH). We want to thank the anonymous reviewers for
their valuable suggestions, as well as Folgert Karsdorp for his advise regarding monotonic
efects.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bamler</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mandt</surname>
          </string-name>
          . “
          <article-title>Dynamic word embeddings”</article-title>
          .
          <source>In: Proceedings of the 34th international conference on machine learning</source>
          . Ed. by
          <string-name>
            <given-names>D.</given-names>
            <surname>Precup</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y. W.</given-names>
            <surname>Teh</surname>
          </string-name>
          . Vol.
          <volume>70</volume>
          .
          <source>Proceedings of machine learning research. Pmlr</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>380</fpage>
          -
          <lpage>389</lpage>
          . url: http://proceedings. mlr.press/v70/bamler17a.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Betti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reynaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ossenkoppele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Oortwijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salway</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Bloem</surname>
          </string-name>
          . “
          <article-title>Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains”</article-title>
          .
          <source>In: Proceedings of the 28th International Conference on Computational Linguistics</source>
          . Barcelona,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (Online):
          <source>International Committee on Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>6690</fpage>
          -
          <lpage>6702</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>586</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gray</surname>
          </string-name>
          . “
          <article-title>Being Specific about Historical Change: The Influence of SubRegister”</article-title>
          .
          <source>In: Journal of English linguistics 41.2</source>
          (
          <issue>2013</issue>
          ), pp.
          <fpage>104</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          , E. Klein, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Loper</surname>
          </string-name>
          .
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit.</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bizzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Degaetano-Ortlieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fankhauser</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Teich.</surname>
          </string-name>
          “
          <article-title>Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach”</article-title>
          .
          <source>In: Frontiers in Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2020</year>
          ), p.
          <fpage>73</fpage>
          . doi:
          <volume>10</volume>
          .3389/frai.
          <year>2020</year>
          .
          <volume>00073</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Blythe</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Croft</surname>
          </string-name>
          . “
          <article-title>S-curves and the mechanisms of propagation in language change”</article-title>
          .
          <source>In: Language 88.2</source>
          (
          <issue>2012</issue>
          ), pp.
          <fpage>269</fpage>
          -
          <lpage>304</lpage>
          . doi:
          <volume>10</volume>
          .1353/lan.
          <year>2012</year>
          .
          <volume>0027</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Bürkner</surname>
          </string-name>
          . “
          <article-title>Advanced Bayesian Multilevel Modeling with the R Package Brms”</article-title>
          .
          <source>In: R Journal</source>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .32614/rj-2018-017.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Bürkner</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Charpentier</surname>
          </string-name>
          .
          <source>Modeling Monotonic Efects of Ordinal Predictors in Bayesian Regression Models</source>
          .
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .31234/osf.io/9qkhj. url: psyarxiv.com/9qkhj.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Carpenter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goodrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Betancourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brubaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Riddell</surname>
          </string-name>
          . “
          <article-title>Stan: A probabilistic programming language”</article-title>
          .
          <source>In: Journal of statistical software 76.1</source>
          (
          <issue>2017</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. Correia</given-names>
            <surname>Saavedra</surname>
          </string-name>
          . “
          <article-title>Measurements of Grammaticalization: Developing a quantitative index for the study of grammatical change”</article-title>
          .
          <source>PhD dissertation</source>
          . Neuchâtel &amp; Antwerpen: l'Université de Neuchâtel &amp; Universiteit
          <string-name>
            <surname>Antwerpen</surname>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>M. Del Tredici</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fernández</surname>
            , and
            <given-names>G. Boleda.</given-names>
          </string-name>
          “
          <article-title>Short-term meaning shift: A distributional exploration”</article-title>
          . In:
          <article-title>Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers)</article-title>
          .
          <source>Minneapolis, Minnesota: Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2069</fpage>
          -
          <lpage>2075</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1210.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Denison</surname>
          </string-name>
          . “
          <article-title>Log (ist) ic and simplistic S-curves”</article-title>
          .
          <source>In: Motives for language change 54</source>
          (
          <year>2003</year>
          ), p.
          <fpage>70</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          . “
          <string-name>
            <surname>Time-Out</surname>
          </string-name>
          :
          <article-title>Temporal Referencing for Robust Modeling of Lexical Semantic Change”</article-title>
          . In:
          <article-title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</article-title>
          . Florence, Italy: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>457</fpage>
          -
          <lpage>470</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          - 1044.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weinshall</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Grossman.</surname>
          </string-name>
          “
          <article-title>Outta control: Laws of semantic change and inherent biases in word representation models”</article-title>
          .
          <source>In: Proceedings of the 2017 conference on empirical methods in natural language processing</source>
          . Copenhagen, Denmark: Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>1136</fpage>
          -
          <lpage>1145</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          - 1118.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Faruqui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Jauhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          , E. Hovy, and
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          . “
          <article-title>Retrofitting Word Vectors to Semantic Lexicons”</article-title>
          .
          <source>In: Proceedings of the</source>
          <year>2015</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          . Denver, Colorado: Association for Computational Linguistics,
          <year>2015</year>
          , pp.
          <fpage>1606</fpage>
          -
          <lpage>1615</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>N15</fpage>
          -1184.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Giulianelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Del Tredici</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>R.</given-names>
            <surname>Fernández</surname>
          </string-name>
          . “
          <article-title>Analysing lexical semantic change with contextualised word representations”</article-title>
          . In:
          <article-title>Proceedings of the 58th annual meeting of the association for computational linguistics</article-title>
          .
          <source>Online: Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3960</fpage>
          -
          <lpage>3973</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>365</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E. Grave. Language</given-names>
            <surname>Identification</surname>
          </string-name>
          .
          <year>2017</year>
          . url: https://fasttext.cc/blog/2017/10/02/blogpost.html.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          . “
          <article-title>Cultural shift or linguistic drift? Comparing two computational measures of semantic change”</article-title>
          .
          <source>In: Proceedings of the 2016 conference on empirical methods in natural language processing</source>
          . Austin, Texas: Association for Computational Linguistics,
          <year>2016</year>
          , pp.
          <fpage>2116</fpage>
          -
          <lpage>2121</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D16</fpage>
          -1229.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          . “
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change”</article-title>
          . In:
          <article-title>Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers)</article-title>
          . Berlin, Germany: Association for Computational Linguistics,
          <year>2016</year>
          , pp.
          <fpage>1489</fpage>
          -
          <lpage>1501</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P16</fpage>
          - 1141.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [15] [16] [18] [19] [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          . “
          <article-title>Challenges for computational lexical semantic change”</article-title>
          .
          <source>In: Zenodo</source>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.5040322.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hilpert</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. Correia</given-names>
            <surname>Saavedra</surname>
          </string-name>
          . “
          <article-title>Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims”</article-title>
          .
          <source>In: Corpus Linguistics and Linguistic Theory</source>
          <volume>0</volume>
          .0 (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .1515/cllt-2017-0009.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Himmelmann</surname>
          </string-name>
          .
          <article-title>Lexicalization and grammaticization: opposite or orthogonal?” In What Makes Grammaticalization: A Look from Its Components</article-title>
          and
          <string-name>
            <given-names>Its</given-names>
            <surname>Fringes</surname>
          </string-name>
          . Ed. by
          <string-name>
            <given-names>W.</given-names>
            <surname>Bisang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Himmelmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Wiemer</surname>
          </string-name>
          . Berlin: Mouton de Gruyter,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoeksema</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. Jo</given-names>
            <surname>Napoli</surname>
          </string-name>
          . “
          <article-title>Just for the hell of it: A comparison of two tabooterm constructions”</article-title>
          .
          <source>In: Journal of Linguistics 44.2</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>347</fpage>
          -
          <lpage>378</lpage>
          . doi:
          <volume>10</volume>
          .1017/ s002222670800515x.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>“Using the OED quotations database as a corpus - a linguistic appraisal”</article-title>
          .
          <source>In: ICAME journal 28</source>
          (
          <year>2004</year>
          ), pp.
          <fpage>17</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hopper</surname>
          </string-name>
          . “
          <article-title>On some principles of grammaticalisation</article-title>
          .” In: Approaches to grammaticalization. Ed. by
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Traugott</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Heine</surname>
          </string-name>
          . Vol.
          <volume>1</volume>
          . Amsterdam: John Benjamins,
          <year>1991</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Liang</surname>
          </string-name>
          . “
          <article-title>Diachronic sense modeling with deep contextualized word embeddings: An ecological view”</article-title>
          . In:
          <article-title>Proceedings of the 57th annual meeting of the association for computational linguistics</article-title>
          . Florence, Italy: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>3899</fpage>
          -
          <lpage>3908</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1379.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jastrzebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leśniak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. M.</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          . “
          <article-title>How to evaluate word embeddings? on importance of data efficiency and simple supervised tasks”</article-title>
          .
          <source>In: arXiv preprint arXiv:1702.02170</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kroch</surname>
          </string-name>
          .
          <source>Penn Parsed Corpora of Historical English. Philadelphia</source>
          ,
          <year>2020</year>
          . url: https: //www.ling.upenn.edu/hist-corpora/.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Øvrelid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Velldal.</surname>
          </string-name>
          “
          <article-title>Diachronic word embeddings and semantic shifts: a survey”</article-title>
          .
          <source>In: Proceedings of the 27th international conference on computational linguistics. Santa Fe</source>
          , New Mexico, USA: Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>1384</fpage>
          -
          <lpage>1397</lpage>
          . url: https://www.aclweb.org/anthology/C18-1117.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>G. Lorenz.</surname>
          </string-name>
          “
          <article-title>Really worthwhile or not really significant ?: A corpus-based approach to the delexicalization and grammaticalization of intensifiers in Modern English”</article-title>
          . In: Typological Studies in Language. Ed. by I. Wischer and
          <string-name>
            <given-names>G.</given-names>
            <surname>Diewald</surname>
          </string-name>
          . Vol.
          <volume>49</volume>
          . Amsterdam: John Benjamins Publishing Company,
          <year>2002</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>161</lpage>
          . doi:
          <volume>10</volume>
          .1075/tsl.49.
          <year>11lor</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Levin</surname>
          </string-name>
          . “
          <article-title>From Insanely Jealous to Insanely Delicious: Computational Models for the Semantic Bleaching of English Intensifiers”</article-title>
          .
          <source>In: Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change</source>
          . Florence, Italy: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          -4701.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          , Á. Kádár, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          . “
          <article-title>Improving Lemmatization of Non-Standard Languages with Joint Learning”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>1493</fpage>
          -
          <lpage>1503</lpage>
          . url: https : / / www . aclweb . org / anthology/N19-1153.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>H.</given-names>
            <surname>Margerie</surname>
          </string-name>
          . “
          <article-title>Grammaticalising constructions: to death as a peripheral degree modifier”</article-title>
          .
          <source>In: Folia Linguistica Historica 45.Historica</source>
          vol.
          <volume>32</volume>
          (
          <year>2011</year>
          ). doi:
          <volume>10</volume>
          .1515/flih.
          <year>2011</year>
          .
          <volume>005</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . “
          <article-title>Efficient Estimation of Word Representations in Vector Space”</article-title>
          .
          <source>In: 1st International Conference on Learning Representations, ICLR</source>
          <year>2013</year>
          , Scottsdale, Arizona, USA, May 2-
          <issue>4</issue>
          ,
          <year>2013</year>
          , Workshop Track Proceedings. Ed. by
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          and
          <string-name>
            <surname>Y. LeCun.</surname>
          </string-name>
          <year>2013</year>
          . url: http://arxiv.org/abs/1301.3781.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          . “
          <article-title>That's sick dude!: Automatic identification of word sense change across diferent timescales”</article-title>
          . In:
          <article-title>Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long papers)</article-title>
          .
          <source>Baltimore, Maryland: Association for Computational Linguistics</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1020</fpage>
          -
          <lpage>1029</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P14</fpage>
          -1096.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nevalainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Raumolin-Brunberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Keränen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nevala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nurmi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>PalanderCollin, A</article-title>
          . Taylor, S. Pintzuk,
          <string-name>
            <given-names>A.</given-names>
            <surname>Warner</surname>
          </string-name>
          , et al. “
          <article-title>Parsed Corpus of Early English Correspondence (PCEEC)”</article-title>
          .
          <source>In: Oxford Text Archive Core Collection</source>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Pechenick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Danforth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Dodds</surname>
          </string-name>
          . “
          <article-title>Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution”</article-title>
          . In: Plos One (
          <year>2015</year>
          ), p.
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          , et al. “
          <article-title>Scikit-learn: Machine learning in Python”</article-title>
          .
          <source>In: the Journal of machine Learning research 12</source>
          (
          <year>2011</year>
          ), pp.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          . “GloVe:
          <article-title>Global Vectors for Word Representation”</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . Doha, Qatar: Association for Computational Linguistics,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>D14</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>F.</given-names>
            <surname>Perek</surname>
          </string-name>
          . “
          <article-title>Recent change in the productivity and schematicity of the way -construction: A distributional semantic analysis”</article-title>
          .
          <source>In: Corpus Linguistics and Linguistic Theory</source>
          <volume>14</volume>
          .1 (
          <issue>2018</issue>
          ), pp.
          <fpage>65</fpage>
          -
          <lpage>97</lpage>
          . doi:
          <volume>10</volume>
          .1515/cllt-2016-0014.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>F.</given-names>
            <surname>Perek</surname>
          </string-name>
          . “
          <article-title>Using distributional semantics to study syntactic productivity in diachrony: A case study”</article-title>
          .
          <source>In: Linguistics 54.1</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .1515/ling-2015-0043.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>R.</given-names>
            <surname>Quirk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Greenbaum</surname>
          </string-name>
          , G. Leech, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Svartvik</surname>
          </string-name>
          .
          <article-title>A Comprehensive Grammar of the English Language</article-title>
          . London: Longman,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rehurek</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sojka</surname>
          </string-name>
          . “
          <article-title>Software Framework for Topic Modelling with Large Corpora”</article-title>
          .
          <source>In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          (
          <year>2010</year>
          ), pp.
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenfeld</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Erk</surname>
          </string-name>
          . “
          <article-title>Deep neural models of semantic shift”</article-title>
          . In:
          <article-title>Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers)</article-title>
          .
          <source>New Orleans</source>
          , Louisiana: Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>474</fpage>
          -
          <lpage>484</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N18</fpage>
          - 1044.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          . “
          <article-title>Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”</article-title>
          .
          <source>In: Journal of computational and applied mathematics 20</source>
          (
          <year>1987</year>
          ), pp.
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          . “
          <article-title>Dynamic embeddings for language evolution”</article-title>
          .
          <source>In: Proceedings of the 2018 world wide web conference. Www '18. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          . doi:
          <volume>10</volume>
          .1145/3178876.3185999. url: https://doi.org/10.1145/3178876.3185999.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sagi</surname>
          </string-name>
          , S. Kaufmann, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Clark</surname>
          </string-name>
          . “
          <article-title>Tracing semantic change with Latent Semantic Analysis”</article-title>
          . In: Current Methods in Historical Semantics. Ed. by
          <string-name>
            <given-names>K.</given-names>
            <surname>Allan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Robinson</surname>
          </string-name>
          . Berlin, Boston: De Gruyter,
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .1515/9783110252903.161.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlechtweg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hengchen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dubossarsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          . “
          <article-title>SemEval2020 Task 1: Unsupervised Lexical Semantic Change Detection”</article-title>
          .
          <source>In: Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          . Barcelona (online):
          <source>International Committee for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . url: https://aclanthology.org/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sommerauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokkens</surname>
          </string-name>
          . “
          <article-title>Conceptual Change and Distributional Semantic Models: an Exploratory Study on Pitfalls and Possibilities”</article-title>
          .
          <source>In: Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change</source>
          . Florence, Italy: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>233</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          -4728.
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sun</surname>
          </string-name>
          , H. Liu, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Xiong</surname>
          </string-name>
          . “
          <article-title>The evolutionary pattern of language in scientific writings: A case study of Philosophical Transactions of Royal Society (1665-1869)”</article-title>
          .
          <source>In: Scientometrics 126.2</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>1695</fpage>
          -
          <lpage>1724</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11192-020-03816-8.
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tahmasebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Borin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          . “
          <article-title>Survey of Computational Approaches to Lexical Semantic Change”</article-title>
          . In: arXiv:
          <year>1811</year>
          .06278 [cs] (
          <year>2019</year>
          ). url: http://arxiv.org/abs/
          <year>1811</year>
          . 06278.
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>N.</given-names>
            <surname>Younes</surname>
          </string-name>
          and U.-D. Reips. “
          <article-title>Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms”</article-title>
          .
          <source>In: Plos One</source>
          (
          <year>2019</year>
          ), p.
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Lai</surname>
          </string-name>
          , and
          <string-name>
            <surname>X. Zhang.</surname>
          </string-name>
          “
          <article-title>Refining Word Embeddings for Sentiment Analysis”</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . Copenhagen, Denmark: Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>534</fpage>
          -
          <lpage>539</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          -1056.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>