<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study on Word2Vec on a Historical Swedish Newspaper Corpus</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sprakbanken &amp; Center for Digital Humanities, University of Gothenburg</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Detecting word sense changes can be of great interest in the eld of digital humanities. Thus far, most investigations and automatic methods have been developed and carried out on English text and most recent methods make use of word embeddings. This paper presents a study on using Word2Vec, a neural word embedding method, on a Swedish historical newspaper collection. Our study includes a set of 11 words and our focus is the quality and stability of the word vectors over time. We investigate whether a word embedding method like Word2Vec can be e ectively used on texts where the volume and quality is limited.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Automatic detection of word sense change has been investigated for the past
decade or so, but has received increasing attention in recent years with (neural)
word embeddings as a new way forward. There are many reasons why detecting
word sense change is necessary; in addition to being interesting on its own (when
and how a word changes its meaning(s)), it is also needed for understanding
documents retrieved from historical corpora and for computationally detecting,
for example, sentiments over time.</p>
      <p>Previous methods for automatic detection of word sense change have included
the comparison of context vectors, topic models and graph-based models as well
as word embeddings. The topic modeling and graph-based methods aim to
separate a word into its di erent senses and make predictions for a word based on its
individual senses. The context based methods and lately the word embedding
methods have made use of representations of the whole word rather than its
senses. These methods typically detect changes in the main (dominant) sense
of a word and cannot distinguish between stable senses and changing ones. The
de ciency of word embedding models can be overcome by using one embedding
per sense and then tracking these embeddings over time, allowing sense di
erentiations like one or more stable senses and one or more changing ones to capture
the full picture. (More on related work in Section 4.)</p>
      <p>Thus far, most, if not all, investigations into automatic detection of word
sense change have focused on English texts for several reasons, the availability
of large diachronic corpora being the most important one. Many (neural)
embedding methods require large amounts of data and, therefore, the applicability
of these methods are limited for languages and time spans that do not have the
required volume of digital data. This problem becomes even more acute if we
wish to make use of sense-di erentiated embeddings where there needs to be
enough data for each sense of a word, thus increasing the data requirements.</p>
      <p>
        In this paper, we will investigate the Word2Vec model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] using the Swedish
historical newspaper archive Kubhist [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We consider this a feasibility study
on neural embeddings for the Kubhist material and, assuming the results show
reasonable quality, a starting point for automatic word sense change detection
on the basis of sense-di erentiated word embeddings.
      </p>
      <p>
        We make use of 11 words, nyhet `news', tidning `newspaper', politik
`politics', telefon `telephone', telegraf `telegraph', kvinna `woman', man `man', glad
`happy', retorik `rethoric', resa `travel' and musik `music'. While some of these
words represent rather stable concepts (e.g. news, happy) others represent new
concepts (e.g. telegraph, telephone) and some have the potential to reveal
interesting cultural changes (e.g. woman, rhetoric, travel). We begin by explaining
our method and then analyze the results for some words. Tables of top k words
not discussed in the paper can be found in the appendix. The full set of results
(all years, all top 10 words) can be found in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>
        We begin with the Kubhist data making use of years 1749-1925, (excluding
Aftonbladet which was added later)1. The data can be found and investigated
in Sprakbanken's research tool Korp [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The 78 papers included in the corpus
consist of 876 million tokens and close to 69 million sentences. Starting in 1845,
there are over 5 million tokens per year and over 14 million tokens at most in
year 1879. We lemmatize the data using the Korp infrastructure and replace
each word with its lemma. We apply Word2Vec (W2V), which is a two-layer
neural net out of the box using the Deeplearning4j (DL4J) package for Java [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>We run the W2V models for each year of the dataset separately. Because
vectors cannot be compared directly when trained on di erent corpora (they
need to be projected onto the same space rst) we make use of the words that
are closest to a vector. That means, for each word w that we investigate, we print
out the 10 words corresponding to the 10 closest vectors to the vector of w for a
given year, i.e., the 10 most similar words to w. When all years are processed, we
have a table for each word w, where each line corresponds to a year and contains
the 10 closest words. Certain years will have no words because a vector could
not be found corresponding to w, i.e., there was too little evidence for w in the
corpus during that year.</p>
      <p>To investigate these tables, we study their content closely, but we also make
use of some statistics. We are mainly interested in how stable the vector spaces
are. If there is word sense change, the vectors should be changing. However, far
1 There is not su cient data in all years for producing vectors. In addition, year 1758
is included for some words and not others and therefore we have chosen to exclude
the year for all words.
word
telegraph
politics
news
woman
happy
telephone
music
travel
newspaper
man
avg. Jaccard (A) avg. freq. (B) corr(A,B)
from every change in the vector space corresponds to word sense change. Since
radical sense change is relatively rare, we use the stability of the vector space as
a quality measure of the vectors.</p>
      <p>To measure the stability, we ask how many of the top 10 words that appear
year after year and nd this by calculating the Jaccard similarity between each
pair of adjacent years. The Jaccard similarity measures, given two lists of words
A and B, the overlap between A and B, divided by the number of unique items
in both A and B. For example, A = fhappy, smiling, gladg and B = fhappy,
joyful, cheerful, excitedg, then the overlap of A and B is 1 (since they share the
word happy ) and there are 3 + 4 - 1 = 6 unique words. The Jaccard similarity is
then 1/6 = 0.167. To be able to investigate how the Jaccard similarity changes
over time, we plot the smoothened Jaccard similarity over time. The smoothing
aims to make the graph simpler to investigate and is the average value of three
years { year i, the one that is plotted, the preceding year i-1 and following year
i+1. The exceptions are the rst and the last years (1749 and 1925) where only
two years are taken into account.</p>
      <p>To put the Jaccard similarities into context, we also plot the normalized
frequency of the word w from the corpus. The normalized frequencies are
computed by Korp and are not smoothened. The correlation values between the
Jaccard similarities and the normalized frequencies are calculated on the
nonsmoothened Jaccard similarities (while the smoothened ones are in the plots for
visual reasons). Finally, we will provide tables for each word where the top 10
words can be viewed for certain years.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>We begin by noting that out of the 11 investigated words, one did not have any
vector representations at all due to its low frequency in the corpus. The words
retorik `rhetoric' appears 78 times during the entire time span of Kubhist, which
amounts to at most three occurrences for one year. That means, we have in total
10 words left for our investigation, however, in the case of `woman', we make use
of two spelling variants (kvinna and qvinna). In Table 1, we can see a summary
of the plots that are shown in Section 3.1. The terms are ordered on the basis of
increasing frequency. An interesting behavior is that while the average Jaccard
similarity increases by one order of magnitude between `telephone' and `music',
the normalized frequencies have a similar increase between `music' and `travel'.
With respect to correlation, `happy' seems to be a trend breaker with a low
correlation corresponding to a low instead of a high frequency. 2
3.1</p>
      <p>Jaccard similarities
In this section, we provide plots for each word in our study with the exception of
`rhetoric' where we have too little data to create yearly word embeddings. Each
plot representing a word w can be read like this: The lled line is the smoothened
Jaccard similarity, the dotted line is the normalized frequency. The values of the
Jaccard similarity can be found on the left y-axis while the frequency values can
be found on the right y-axis. In the title, we show the word and after it, the
Spearman correlation value between the (non-smoothened) Jaccard similarities
and the normalized frequencies.</p>
      <p>Important to note when studying the plots is that a zero Jaccard similarity
cannot be used to determine whether a word has a vector or not; for example,
2 For a full answer to why these behaviors di er, many more words must be included
in our study. This is left for future work.</p>
      <p>Man, correl = 0,066827
0 1749 1757 1782 1789 1796 1803 1810 1817 1824 1831 1838 1845 1852 1859 1866 1873 1880 1887 1894 1901 1908 1915 1922</p>
      <p>resa frequency-norm
200
180
160
140
120
100
80
60
40
20
0 49 56 80 86 92 98 04 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 00 06 12 18 240
17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19</p>
      <p>musik frequency-norm</p>
      <p>Newspaper, correl = 0,327119
800
700
600
500
400
300
200
100
0 1749 1756 1780 1786 1792 1798 1804 1810 1816 1822 1828 1834 1840 1846 1852 1858 1864 1870 1876 1882 1888 1894 1900 1906 1912 1918 1924</p>
      <p>0
tidning frequency-norm
the Jaccard similarity of `woman' did not go above 0 until 1887 although the
rst vector appeared in 1859.</p>
      <p>Woman, correl = 0,664913
180
160
140
120
100
80
60
40
20
0 1749 1756 1780 1786 1792 1798 1804 1810 1816 1822 1828 1834 1840 1846 1852 1858 1864 1870 1876 1882 1888 1894 1900 1906 1912 1918 1924</p>
      <p>0
politik Freqvens-norm
0 1749 1756 1780 1786 1792 1798 1804 1810 1816 1822 1828 1834 1840 1846 1852 1858 1864 1870 1876 1882 1888 1894 1900 1906 1912 1918 1924</p>
      <p>0
kvinna Frequency-norm
News, correl = 0,474669
180 0,14
160 0,12
112400 0,1
100 0,08
80 0,06
6400 0,04
20 0,02
0 49 56 80 86 92 98 04 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 00 06 12 18 240
17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19
nyhet frequency-norm
180
160
140
120
100
80
60
40
20
0 49 56 80 86 92 98 04 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 00 06 12 18 240
17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19
glad frequency-norm</p>
      <p>The plots in Figure 1 correspond to the most frequent words. Out of the four
words, `music' is the word that stands out with a high correlation. It is however
the word with the lowest average frequency (see Table 1). For the word
`newspaper' we nd that the frequency increases after the 1870s and the correlation
between the two graphs is 0.6 for the years 1790-1877 as compared to 0.19 for
1878-1925. For the lower frequency words, `woman', `politics' (with the exception
of the events in the 1880s), and `news' (all in Figure 2) as well as `telephone'
and `telegraph' (in Figure 3) , we see a high correlation between the Jaccard
similarities and the frequencies. It seems that the more frequent the term, the
lower the correlation with the Jaccard similarities. Reasonably, after a certain
amount of data has been gathered, the embeddings become less dependent on
the volume. And the lower the amount of data, the less stable the vectors.</p>
      <p>Among the top frequent words, we nd reasonably high Jaccard similarities
each year, indicating fairly stable vectors. None-the-less, on average only 10-20
percent of all words are stable for each year.</p>
      <p>The word `music' has an interesting appearance, the correlation is high and
both graphs show higher results before the 1830s, have a drop and then increase
slowly again. More investigation and close reading is necessary to determine
what is happening in the 1830s and 1840s.</p>
      <p>Telephone, correl = 0,461275</p>
      <p>Telegraph, correl = 0,466208
0,35
0,3
0,25
0,2
0,15
0,1
0,05
300 0,4
250 0,35</p>
      <p>0,3
200 0,25
150 0,2
100 0,01,15
0 1749 1756 1780 1786 1792 1798 1804 1810 1816 1822 1828 1834 1840 1846 1852 1858 1864 1870 1876 1882 1888 1894 1900 1906 1912 1918 19240</p>
      <p>telefon frequency-norm
In this section, we will present tables for each word and the top words that are
most similar for a given year (we include as many of the top 10 words as will
t on the page; if all 10 words seem equally important, we will reduce the font
size). Most years are not represented due to space constraints, but the rst year
is always present. For example, kvinna `woman' had a W2V vector in 1859 and
hence the top words for that year are included. The years chosen for each word
will include the years with the highest Jaccard similarities, but also years that
contain interesting words, so years will be di erent for each word. In each table
we will include the top 10 words for the vector for the word when trained on the
entire Kubhist corpus at once, called all.
year</p>
      <p>words
1859 n&amp;gt iinnu lz lllu &amp;ltle lilliz &amp;ltli&amp;gt ll &amp;lt&amp;gt ssn
1860 -ne folt forbi ytande napen gigg soin lwcnnc atdon mellau nntcr
1877 patagligen ofvervunnen ehuruval alldaglig inspiration forstadda fordomsfri
1888 qvinna yvo s icka austins ung svagerska manniska vuxen anka valmaende
1903 icka barn ung excentrisk sjuk valdta oregelbundet ansedd forfora tard
1906 ung icka halahult hjalplos orkeslos luggsliten rostratt adling rostresurser otukt
1910 manniska rostratt nyck ung nonchalant dods ende hederskansla lika dem foibi
1911 kullkasta rostratt samtid tapper karaktarsfast armod ung skicklig dryckenskap
1912 valbarhet valratt rostratt sjal orsorjande sexuell okunnig hogerparti politisk radikal vansterparti
1925 lik ung sara radda foraktfullt roddbat drage hennes allvarligt medtagen
all ung person icka barn var tva endast dem vuxen fullvuxen</p>
      <p>Table 2. Table for kvinna `woman'
qvinna vs. kvinna
300
qvinna/
qvinnor
kvinna
Fig. 4. Normalized
frequency of the lemma
kvinna and the words
`qvinna' and `qvinnor',
a previous spelling. We</p>
      <p>nd that while both
co-existed during a
period, the `qv' spelling
was preferred before
the `kv' spelling took
over.</p>
      <p>Prior to 1859, a century in to our corpus, we cannot nd any vectors for the
word kvinna `woman', which nds its explanation in the frequency of the word
prior to 1859 being very low and mostly accidental (by means of spelling errors).
This is due to spelling variations, where qvinna was the commonly used spelling.
Figure 4 shows the mostly complementary frequencies of the lemma kvinna and
the words qvinna and qvinnor. Before then, spellings like qwinna and kona were
used. None-the-less, it seems women were mentioned more frequently toward
the mid-end of the 19th century and the example shows the need for detecting
language changes (spelling changes as well as sense changes) when analyzing
historical texts.</p>
      <p>When it comes to the top 10 words for kvinna `woman', we nd that the
rst few provide little reasonable content; the words are noisy with spelling
errors. In 1888, the words are mostly descriptive of di erent kinds of women;
`young', `sister in law', `grown', `human' and `widow'. We nd the rst and only
occurrence of valdta `to rape' in 1903 (for women), together with words like
`girl', `kid', `young' and `seduce'. The word `to rape' is most likely lemmatized
from `raped', and the same goes for `seduced'. In 1906, the word rostratt `right
to vote' shows up among the top 10 words for `woman', 13 years before women
were allowed to vote in Sweden. Around that time, we see a strong increase in
the frequency of woman, hence, they are more present in the newspapers. To
complement, we have the top 10 words for the qvinna spelling in Table 3. The
rst vector appears in 1828 with words that have little to do with women. In
1837, we have a description of women with `vale', `beauty', `naked' and `abuse'
which is most likely a lemma of `abused'. In 1850, women are described with
words that relate to their o spring, `still born', `twin', `boy' and `girl' while a
year later we are back to a reasonably positive description, with `lovable', `loved',
`lover', and `kissed'.</p>
      <p>For the word politik `politics' shown in Table 4, we nd an interesting
behavior around the end of the 1840s and 1880s. In 1838-1839, the Swedish historian
and riksdagsman Erik Gustaf Geijer moved from conservatism to liberalism and
joined the movement for the common right to vote3. This might be the rst spike
that we see in frequency during this time. The second spike is likely due to a
newspaper called politik `politics' as seen from the quote:
i den i kopenhamn utkommande tidning " politik " `in the Copenhagen-published
newspaper "politics"'.</p>
      <p>For the word telefon `telephone' we nd high values of Jaccard similarity
around the 1880s (in a period with a lower normalized frequency), which seems
to be due to a Mr Hakan Bengtton (possibly Bengtson) who was a publisher
for Goteborgs handelstidning, the Gothenburg trade paper. With low amounts
of data, these kinds of peculiarities are seen more often. Telefonf is short for
telefonforbindelse which was another way of saying `telephone number', typically
though in this format: telefon : allm . telefonf . 519 .
year</p>
      <p>words
1828 sang hufvud forgafves inskrankta nedslagenhet ofverhus volontar olyckligtvis
1837 sloja spad hydda skonhet, skagg obandig naken forfora turban misshandla
1850 dodfodda dodfodd tvilling kon gosse akta hicka lefvande promenerande ickebarn
1851 alskvard kyssa tusenskalm alskad alskare uppfostrad jollra vardinna qvinlig korsven
1872 qvinlig ensamhet fortrollande tjusande motvilja svartsjuka drommande vink skonhet
all varelse egensinnig oerfaren hjartlos ljusharig slafvinna graharig tillbedjare dygdig varldsdam
1779 tilsammans faststalla runa bitradd gibraltar medborgerlig skatte arva sand intagande
1838 intervention tadla asigt anda parlamentarisk erfaldigt politisk kraftig afgjord sansad
1842 opposition politisk konstitutionell tadla liberal handla mening grundsats sondring
1888 o cios press tysklands trontal bulgarien europeisk novoje organ bulgarisk rysslands
1891 socialdemokrati press politisk standpunkt statsman dementi frisinnad parlamentarisk makt
1925 naring trygghet kamp arbetarrorelse konservativ nationell stravan europa neutralitet
all socialdemokrati utrikespolitik demokratisk forsvarspolitik demokrati politisk parlamentarisk taktik
3
https://www.sydsvenskan.se/2014-03-14/las-utdrag-ur-per-t-ohlssons-nya-boksvensk-politik, from Per T Ohlssons new book, Svensk politik `Swedish Politics'.</p>
    </sec>
    <sec id="sec-4">
      <title>State of the Art</title>
      <p>
        The rst methods for automatic word sense change detection were based on
context vectors; they investigated semantic density (Sagi et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and utilized
mutual information scores (Gulordava and Baroni [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) to identify semantic change
over time. Both methods detect signals of change but neither aligns senses over
time or determines what has changed.
      </p>
      <p>
        Topic-based models (where topics are interpreted as senses) have been used
to detect novel senses in one collection compared to another by identifying new
topics in the later corpus (Cook et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]; Lau et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), or to cluster topics over
time (Wijaya and Yeniterzi [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). A dynamic topic model that builds topics with
respect to information from the previous time point is proposed by Frermann
and Lapata [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and again sense novelty is evaluated. With the exception of
Wijaya et al. who partition topics, no alignment is made between topics to allow
following diachronic progression of a sense.
      </p>
      <p>
        Graph-based models are utilized by Mitra et al. [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ] and Tahmasebi [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
and aim to reveal complex relations between a word's senses by (a) modeling
senses per se using WSI; and (b) aligning senses over time.
      </p>
      <p>
        The largest body of work has been done using word embeddings of
different kinds in recent years (Basile et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]; Kim et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]; Zhang et al.
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).Embeddings are trained on di erent time-sliced corpora and compared over
time. Kulkarni et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] project words onto their frequency, POS and word
embeddings and propose a model for detecting statistically signi cant changes
between time periods on those projections. Hamilton et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] investigate both
similarity between a priori known pairs of words, and between a word's own
vectors over time to detect change. [
        <xref ref-type="bibr" rid="ref15 ref18 ref19">15,19,18</xref>
        ] all propose di erent methods for
projecting vectors from di erent time periods onto the same space to allow
comparison. These methods can nd changes in the dominant sense of a word but
cannot di erentiate between senses or allow some senses to stay stable while
others change. The advantage of word embeddings over graph-based models, for
example, is the inherent semantic similarity measure, where otherwise resources
like WordNet are often used. We believe that the future lies in a combined
approach, using embeddings (possibly multi-sense embeddings [
        <xref ref-type="bibr" rid="ref20 ref21 ref22">20,21,22</xref>
        ]) and
sense-di erentiated techniques.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we performed a study on (neural) word embeddings for a Swedish
historical newspaper corpus, Kubhist. Our aim was to assess the quality of the
Word2Vec model when the volume and quality of the text is limited, as is the
case for most languages for historical contexts, English being the exception. Our
timespan was 1749-1925, with the majority of the content being placed in the
period 1850-1900. We investigated the stability, and through that, the quality of
the resulting vector space for a set of 11 words. As a measure of stability, we use
the word overlap between the top 10 most similar words for adjacent years. We
see a clear relation between the frequency of a word and the overlap from one
year to another. The higher the frequency of a word, the higher the stability for
the vectors. Conversely, the lower the frequency, the less stability we have.</p>
      <p>None-the-less, even the highly stable words, `music', `travel', `newspaper' and
`man' only have an average of 0.11-0.19 overlap (Jaccard similarity). This means
that even the most stable words do not share many words in common from one
year to another. This gives us reason to believe that the vector space produced by
Word2Vec cannot be directly used for word sense change detection, in particular
not if sense-di erentiated embeddings are intended where the textual evidence
for each word must be further divided into senses, thus decreasing the amount
of available text for each vector.</p>
      <p>
        Our ndings are in line with those of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] that point to the randomness that
a ects the outcome of embeddings like Word2Vec, both for the initialization as
well as the order in which the examples are seen for training and of [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] that
point to over tting when there is too little data. For the Kubhist data, there
are only ve 10-year periods, between 1850-1890, with over 100 million tokens,
thus limiting the possibility of nding changes in stable vectors corresponding
to true word sense change.
      </p>
      <p>
        One peculiarity that we notice is the spelling errors that are present in the top
10 word lists. This indicates that one future direction is the correction of spelling
errors to increase the quality and volume of the text. Our current work aims to
investigate a newly digitized version of Kubhist (which we have been promised
in the near future by the Royal Library) to distinguish the role of OCR errors
from spelling variations and measure the improvement when correcting for both,
making use of embeddings based on Singular Value Decomposition [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which
is better equipped for handling historical texts and removing the randomness of
Word2Vec.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been funded in parts by the project \Towards a knowledge-based
culturomics" supported by a framework grant (2012{2016; dnr 2012-5738) and
by an infrastructure grant (SWE-CLARIN, 2014 { 2018; contract no.
821-20132003), both from the Swedish Research Council.
Appendix { Top k word tables
1802 ny ken ytterlig fransoscrne segla pasta dey aftradande toussaint befara iudarne
1823 telegraf-depescher vpanien lissabon bulletin corfu kapitulera rapportera ankommet
1862 hamburg kursnotering eonsols notera tclegramm london vexelkontor telegramm
1863 hamburg kursnotering telegramm consols paris eonsols borsforeningen london notera
1884 telegrafering direkt posto morning post boende persontag mrd c e e franboende
1916 tradlos texas telegra sk arkangelsk mantyluoto graecia spanien lots kirkwall avsanda
all tradlos telefonforbindelse forbindelse linie dominion-liniens snabbgaende cymrlk tradlos</p>
      <p>Table 6. Table for telegraf `telegraph'
year
year
year
year
1771 passera ankomma wart frukta compagniet nodsakad besynnerlig corsica indra
1782 fang himmel hydda frogd nad grymt hyf qval plaga gud
1807 dina purpur hjerta ditt smarta opp karlek blick sjal it
1883 fortjust tjusande sorgsen hanryckt foralskad silfverklar herrlig forlagen godlynt
1886 obeskri igt snyfta forlagen trostande orolig snallt vemodigt sucka karleksfull
1884 fortjust bedrofvad hungrig munter frojd retligt ledsen herrlig fornojd lycklig
1898 retlig blyg forskrackligt hungrig generad tankfullt gladt vidskeplig fortjust
all gladt skon munter stolt idel fortjust frojd gladje blid fager
Table 7. Table for glad `happy'. In 1886, no words overlap and the words are
counterintuitive; snyfta `sob', orolig `worried' and forlagen `embarrassed'.
1749 eller for med inkomne pa foda wid och dod
1794 nia men han jag gora ende skola nal formodcligen sannolikhet
1877 han men ofverdrifvet formatet markvardigt charlatan hon blodsutgjutelse tvifvelaktig obrottsligt
1918 men vi emellertid aldrig nog just det nagot kanske snart
1919 nagot ga vi men alltfor langre nog nagon ju kunna
all nia men han vi kunna de aven da dessa sa
Table 9. Table for man `man': Maximum Jaccard similarity is 0.43 for 1794, where
mostly pronouns are overlapping. Fiende 'enemy' is the only content word that the two
adjacent words have in common.
1750 hogst fara kraft tyckas fattas sida igen kanna naturlig forundra
1847 tidningar inford inrikes posloch post post-och inriket postoch post-ici
ost1848 tidningar inrikes inford ost- post-och postoch timing imikes posloch post
1902 borat wermlands-tidningen illustreradt lindesberg sundsvalls-posten kasor vpsala
1903 posten bohuslanning vestergstlands vastergotlands ipsala annonsblad falu-kuriren
all skriva dalpil correapondenten inford stockholms-tidningen for o-tg dala-bladet nummer spalt</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>CoRR abs/1301</source>
          .3781 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Sprakbanken:
          <article-title>The Kubhist Corpus</article-title>
          . Department of Swedish, University of Gothenburg. https://spraakbanken.gu.se/korp/?mode=kubhist.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tahmasebi</surname>
          </string-name>
          , N.:
          <article-title>W2V experiments on Kubhist</article-title>
          . Sprakbanken, Department of Swedish, University of Gothenburg. http://hdl.handle.net/10794/ word2vec-study-kubhist.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Borin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forsberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roxendal</surname>
          </string-name>
          , J.:
          <article-title>Korp { the corpus infrastructure of Sprakbanken</article-title>
          .
          <source>LREC</source>
          <year>2012</year>
          (
          <year>2012</year>
          )
          <volume>474</volume>
          {
          <fpage>478</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Team</surname>
          </string-name>
          , D.D.: Deeplearning4j:
          <article-title>Open-source distributed deep learning for the jvm</article-title>
          ,
          <source>apache software foundation license 2</source>
          .0. http://deeplearning4j.org (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sagi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Kaufmann,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Semantic density analysis: comparing word meaning across time and phonetic space</article-title>
          .
          <source>GEMS '09</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          (
          <year>2009</year>
          )
          <volume>104</volume>
          {
          <fpage>111</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gulordava</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus</article-title>
          .
          <source>GEMS '11</source>
          ,
          <string-name>
            <surname>Association for Computational Linguistics</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <volume>67</volume>
          {
          <fpage>71</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Novel word-sense identi cation</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          , Dublin, Ireland (
          <year>August 2014</year>
          )
          <volume>1624</volume>
          {
          <fpage>1635</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Word sense induction for novel sense detection</article-title>
          .
          <source>In: EACL</source>
          <year>2012</year>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
          <volume>591</volume>
          {
          <fpage>601</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wijaya</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeniterzi</surname>
          </string-name>
          , R.:
          <article-title>Understanding semantic change of words over centuries</article-title>
          .
          <source>In: Proc. of the international workshop on DETecting and Exploiting Cultural diversiTy on the social web. DETECT '11</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <volume>35</volume>
          {
          <fpage>40</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Frermann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A bayesian model of diachronic meaning change</article-title>
          .
          <source>TACL 4</source>
          (
          <year>2016</year>
          )
          <volume>31</volume>
          {
          <fpage>45</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maity</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An automatic approach to identify word sense changes in text media across timescales</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>21</volume>
          (
          <issue>05</issue>
          ) (
          <year>2015</year>
          )
          <volume>773</volume>
          {
          <fpage>798</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>That's sick dude!: Automatic identi cation of word sense change across di erent timescales</article-title>
          .
          <source>In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2014</year>
          USA. (
          <year>2014</year>
          )
          <volume>1020</volume>
          {
          <fpage>1029</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tahmasebi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Risse</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Finding individual word sense changes and their delay in appearance</article-title>
          .
          <source>In: Proceedings of the International Conference Recent Advances in Natural Language Processing</source>
          ,
          <string-name>
            <surname>RANLP</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>(</article-title>
          <year>2017</year>
          )
          <volume>741</volume>
          {
          <fpage>749</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luisi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Semeraro</surname>
          </string-name>
          , G.:
          <article-title>Diachronic analysis of the italian language exploiting google ngram</article-title>
          .
          <source>In: Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ).
          <article-title>(</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>Y.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanaki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hegde</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Temporal analysis of language through neural language models</article-title>
          .
          <source>In: Workshop on Language Technologies and Computational Social Science</source>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jatowt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanaka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Detecting evolution of concepts based on causee ect relationships in online reviews</article-title>
          .
          <source>In: Proceedings of the 25th International Conference on World Wide Web, ACM</source>
          (
          <year>2016</year>
          )
          <volume>649</volume>
          {
          <fpage>660</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Rfou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perozzi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skiena</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Statistically signi cant detection of linguistic change</article-title>
          .
          <source>In: World Wide Web, ACM</source>
          (
          <year>2015</year>
          )
          <volume>625</volume>
          {
          <fpage>635</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>W.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Trask</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michalak</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>sense2vec - A fast and accurate method for word sense disambiguation in neural word embeddings</article-title>
          .
          <source>CoRR abs/1511</source>
          .06388 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Do multi-sense embeddings improve natural language understanding?</article-title>
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , ACL (
          <year>2015</year>
          )
          <volume>1722</volume>
          {
          <fpage>1732</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Pelevina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arefyev</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Making sense of word embeddings</article-title>
          .
          <source>In: Proceedings of the 1st Workshop on Representation Learning for NLP</source>
          . (
          <year>2016</year>
          )
          <volume>174</volume>
          {
          <fpage>183</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Hellrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hahn</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Bad company - neighborhoods in neural embedding spaces considered harmful</article-title>
          .
          <source>In: COLING</source>
          <year>2016</year>
          .
          <article-title>(</article-title>
          <year>2016</year>
          )
          <volume>2785</volume>
          {
          <fpage>2796</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Bamler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dynamic word embeddings</article-title>
          .
          <source>In: Proceedings of the 34th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>(</article-title>
          <year>2017</year>
          )
          <volume>380</volume>
          {
          <fpage>389</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>