<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jani Marjanen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lidia Pivovarova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elaine Zosa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jussi Kurunmaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Helsinki</institution>
          ,
          <addr-line>Helsinki</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Tampere</institution>
          ,
          <addr-line>Tampere</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>During the course of the nineteenth century, ideological language mostly expressed through isms such as liberalism, socialism or conservatism, entered the lexicon in most European languages. Previous research has based on reading key texts claimed that the su x ism was introduced to new linguistic domains during the period up to WWI, many of which do not relate to ideology. This paper uses a data-driven way to study the emergence of isms in nineteenth-century Finnish newspapers and uses word embeddings to cluster them and to trace their thematic expansion in the period. As such, the study provides a quantitatively sound way of tracking how isms relate to ideological language and more generally contributes to the understanding of the development of political language in Finland.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>to cluster di erent isms and words close to them in the distributional space over time. In
doing so we assume that the distributions allow for clustering isms either according to semantic
similarity or similarity in rhetorical tropes or pragmatics.</p>
      <p>In this paper we do not use methods based on comparing a word vector from one time slice
to a vector for the same word in another time slice, since those methods are aimed at nding
radical changes in word senses, such as discussing new sense acquired by words like gay or
computer in the twentieth century. Words that we are primarily interested in this study did
not undergo such radical transformations|e.g. patriotism meant more or less `love for one's
country' throughout the whole nineteenth century|though context and valuation of its usage
changed. Instead, we apply clustering of word vectors and demonstrate that word clusters
changed as the context of the ism vocabulary was expanded over time.</p>
      <p>
        Clustering isms over a long period of time in a data-driven way poses a number of
methodological problems, which requires testing and exploration. The potential bene t of doing this
lies in producing a statistically robust image of how isms developed. Earlier studies have
argued that isms transformed from the religious sphere, to the political and ideological sphere in
the late eighteenth century and early nineteenth century with pivotal isms such as patriotism,
liberalism and socialism transforming the eld. The eld of isms further expanded in the late
nineteenth century with new isms in philosophy, science and arts appeared [
        <xref ref-type="bibr" rid="ref5 ref7">9, 11</xref>
        ]. A
datadriven clustering currently already shows how the vocabulary of isms indeed expanded over the
nineteenth century and how the political isms do cluster quite heavily, whereas medical words
ending with the same su x, such as the very common word rheumatism, are de nitely kept
separate from any ideological debate revolving around ism words. Our analysis also suggests
that with changes in political context key isms were clustered di erently based on the
political situation they described. This change is partly about changes in semantics, but not only.
For instance, an ism like `socialism' did have a remarkable semantic continuity throughout the
nineteenth century, but what it meant for newspapers to write about socialism changed when
socialism had been associated more with radicalized political events. Contestation regarding
socialism had much to do with potential radical futures associated with it.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Research questions, methods and data</title>
      <sec id="sec-2-1">
        <title>Research questions</title>
        <p>This paper studies isms as particularly laden keywords in societal discourse in Finland in the
long nineteenth century. We address the following research questions:</p>
        <p>How did the vocabulary of isms expand in the period?
Which isms appear as similar based on their embeddings?
Are there interesting continuities in the enriched clustering that takes into account nearest
neighbors of the isms?
Finally, we shortly discuss the di erences in Finnish-language and Swedish-language discourse
in Finland when looked upon through isms.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data</title>
        <p>
          We use a digitalized collection of nineteenth-century Finnish newspapers freely available from
the National Library of Finland [
          <xref ref-type="bibr" rid="ref15">19</xref>
          ]. Though the archive contains newspapers starting from
1770s, the earlier time periods do not have enough data for the automatic analysis we apply
in this paper. Thus, we use data from 1820 to 1917. The collection contains newspapers in
the Russian, German, Swedish and Finnish languages, with the latter two as the dominant
languages. In our analysis, these dominant languages are treated as two separate corpora even
though contemporaries often relied on newspapers in both languages [4]. The total amount of
words in both corpora is presented in Table 1.
        </p>
        <p>
          Both corpora are lowercased and lemmatized using LAS, an open-source language-analysis
tool [
          <xref ref-type="bibr" rid="ref11">15</xref>
          ]. LAS is a meta-analysis tool that provides a wrapper for many existing tools developed
for speci c tasks and languages. Though LAS supports multiple languages, most e orts were
done to process Finnish data, including historical Finnish. The output for our Swedish data
is more noisy. In particular, the Swedish LAS lemmatizer is unable to predict lemma for
outof-vocabulary words, e.g. boulangismen (de nite form of `boulangism'). Thus we applied the
additional normalization and convert all words ending with -ismen or -ismens into -ism forms.
For all other words we use the LAS output; implementation of proper Swedish lemmatization
is beyond the scope of this paper.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Diachronic embeddings</title>
        <p>
          To trace semantic shifts in word meanings we split a lemmatized corpus into double decades
(1820{1839, 1840{1859, and so on until 1900{1917) and train continuous embeddings [
          <xref ref-type="bibr" rid="ref14">18</xref>
          ] on
each time slice. We use the Gensim Word2Vec implementation [
          <xref ref-type="bibr" rid="ref17">21</xref>
          ] using the Skip-gram model,
with a vector dimensionality of 100, window size 5 and a frequency threshold of 100|only
lemmas that appear more than 100 times within a double decade are used for training. That way
we try to ensure that each word in a model has reliable amount of context and the embeddings
are trustworthy. However, we lose some isms because they appear less than 100 times in a
double-decade. For example, the Finnish word feminismi was mentioned 91 times between 1900
and 1917 and was excluded from our analysis, while its Swedish counterpart was mentioned
242 times and is visible in our results. Our models allow us to detect when a word became
frequent, in what context it was used and what is the di erence between Swedish and Finnish
contexts. They do not allow, however, to check when the word appeared for the rst time and
comparison of word distributions between languages is not fully reliable for less frequent words.
        </p>
        <p>
          Since training word embeddings is a stochastic process, the particular values of vectors do
not stay close across runs, though distances between words are quite stable. To ensure that
embeddings are stable across time slices, we follow the approach proposed in [
          <xref ref-type="bibr" rid="ref6">10</xref>
          ]: embeddings
https://github.com/jiemakel/las
for t + 1 time slice are initialized with vectors built on t; then training continues using new data.
The learning rate value is set to the end learning rate of the previous model, to prevent models
from diverging rapidly. This approach has been previously used in [
          <xref ref-type="bibr" rid="ref4">8</xref>
          ] with slightly di erent
data.
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Clustering</title>
        <p>
          To investigate the expansion of the vocabulary of isms we cluster words into semantically close
groups. Since our task is mostly exploratory and the number of clusters cannot be known
in advance we apply the A nity Propagation clustering technique [
          <xref ref-type="bibr" rid="ref2">6</xref>
          ]. The method splits all
datapoints into exemplars, i.e. cluster representative tokens, and instances, i.e. other members
of clusters. At the initial step all datapoints present a cluster of their own. Then for each
instance-representative pair a likelihood for an instance to be represented by an exemplar is
computed by taking into account all other instances of the exemplar and all other available
exemplars for the instance. This computation is repeated until convergence; if an exemplar has
no instances it is dismissed. We use standard implementation from Scikit-learn package [
          <xref ref-type="bibr" rid="ref16">20</xref>
          ],
with default parameters.
        </p>
        <p>
          A nity Propagation has been previously used for various language analysis tasks,
including collocation clustering into semantically related classes [
          <xref ref-type="bibr" rid="ref9">13</xref>
          ] and unsupervised word sense
induction [1]. The main advantages of the method are that it detects the number of clusters
automatically and is able to produce clusters of various size. As a side e ect it returns
exemplars, i.e. cluster representatives, which are not necessary equal to the geometric centre of the
cluster.
        </p>
        <p>The main drawback of the A nity Propagation is pairwise computations. The method
is quadratic in time and memory and cannot be applied to large datasets, such as a whole
corpus vocabularly. Thus, data selection is an unavoidable step. In this paper we use A nity
Propagation in two experiments.</p>
        <p>In the rst experiment, we extract from the corpus all ism words. i.e. words that end with
-ism in Swedish and -ismi in Finnish and cluster only this set of words. The extraction allows
us to identify how close these words to each other given other isms in the corpus.</p>
        <p>In the second experiment, we try to put isms into a richer context and trace other words
associated with them in the respective double-decades. We extract from the corpus all words
which have a cosine similarity to any ism that is less than 0.5. Then we perform clustering
on this enriched dataset. Finally, the clusters are ltered so that only clusters that contain at
least one ism word are presented for the qualitative analysis. An output of this procedure is
di erent comparing to the rst experiment, i.e. words that clustered together in the ism-only
clustering can break up into di erent enriched clusters, since in the latter setting they have
more exemplar options.</p>
        <p>Clustering is performed separately for each time slice. To link clusters across time we
perform visualization with Sankey charts. In the Sankey diagram, clusters from time slice t are
linked to clusters in time slice t + 1 based on the number of words they have in common.</p>
        <p>The magnitude of the link is the sum of the word frequencies (from the source cluster, that
is the cluster from time slice t) of the common words between the connected clusters.</p>
        <p>We exclude from the list words that are shorter than 5 characters for Swedish and 6 characters for Finnish.
This is to lter out obvious OCR bugs such as ism, tism, rism, etc. Though the words `ism' and `ismi' exist in
the Swedish and Finnish languages, they are very uncommon in nineteenth-century press.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Swedish and Finnish clusters</title>
        <p>As expected, Finnish-language and Swedish-language isms cluster di erently in terms of timing
and themes that are present. There are three main reasons for this:</p>
        <p>Swedish-language press in Finland developed earlier and included more abstract content
earlier in the century, whereas newspapers in Finnish|and the Finnish written language|
started maturing only in the latter half of the century. Consequently, we have been able
to produce meaningful clusters of isms for 1820s onward for Swedish and only from the
1860s onward for Finnish.</p>
        <p>The -ismi was not a productive su x in the Finnish language but used through cognate
loans and through analogous derivation of foreign words. Consequently, isms are in general
less common and ism words less productive in Finnish than in Swedish but nonetheless
used especially as Finnish political language in the nineteenth century developed through
an interplay between the two main languages in the country.</p>
        <p>
          The political outlook of the two languages was slightly di erent. From the 1880s onward
the Finnish and Swedish newspapers were printed in nearly equal amounts. At this time
the language spheres also started specializing. Swedish speakers lived mostly in larger
towns and around the coast, whereas Finnish speakers occupied the whole country [
          <xref ref-type="bibr" rid="ref12">16</xref>
          ].
At this point, Finnish-language papers were more likely to have a rural or working-class
background and Swedish-language papers were more likely to be more urban, liberal and
bourgeois, which naturally also shows in the use of isms. This is typically visible in the
proportionately big role the cluster around socialism manifests in Finnish compared to
Swedish. The clusters clearly show how Finnish-language ism vocabulary was more
politically oriented in the early twentieth century. Cultural, philosophical and scienti c isms
were less present. This has partly to do with the outlook of Finnish-language newspapers,
but partly it seems that political isms were not as easily translated into vernacular forms
without an ism, whereas for other terminology, this option was more readily at hand.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Politics and ideology as distinct clusters</title>
        <p>Aligning the clusters in the Sankey plots provides a possibility of visually exploring how the
vocabulary of isms developed over the course of the century. As can be seen in Figure 1, for
Swedish there is quite a steady expansion of isms from the 1820s onward. As the models for
producing the clusters rely on enough datapoints for training, particular clusters appear with a
delay compared to rst uses of particular words. For instance, patriotism appears the rst time
in the corpus in 1791 and liberalism 1820, but the clusters in which they are part of (but not
necessarily cluster representatives or most frequent ones) appear in 1820{1839 and 1840{1859,
as can be seen in Swedish clusters. The word socialism appears the rst time in 1840 and also
appears in the cluster for 1840{1859 respectively, since it immediately became popular and the
amount of newspapers in Swedish had already grown.</p>
        <p>Figure 1 suggests that there is a clear continuity in the politically laden isms which start
from a cluster with patriotism, fanatism (Eng. fanaticism) and despotism in one cluster in
1820{1839 and continue with an expansion over the consecutive double decades. Most frequent
isms in the political clusters are patriotism, socialism and despotism up to 1859, and then
boulangism, fanatism, anarkism, nationalism and kapitalism (Eng. capitalism) up to 1917.
There is some uctuation between the political clusters, like liberalism and patriotism being
quite tightly associated until the last time slice of the investigated period, and some unsurprising
continuities, like konservatism (Eng. conservatism) and liberalism being in the same clusters
through out. Still, it seems that there is less uctuation between the distinctly political clusters
and the other clusters. Also the the religious isms (starting from pietism), and medical isms
(rheumatism) come across as reasonably stable. The philosophical, artistic and scienti c isms
are also distinguishable, albeit they are less clear cut.</p>
        <p>For Finnish, the data is too scarce to produce meaningful clusters for more than three time
slices Even though the Finnish corpus for the 1880{1899 double decade is comparable in size
with the Swedish corpus, the number of distinct isms in Finnish is smaller than in Swedish: 44
for Finnish and 125 for Swedish.</p>
        <p>
          With scarcer data the distinctness of the clusters is even clearer. Clusters with socialism as
the most frequent ism are rather dominant both for Swedish and Finnish, but the role of
socialism as a pivotal ism is even more pronounced for the latter as is also indicated by [
          <xref ref-type="bibr" rid="ref13">17</xref>
          ]. Further
work is needed to explain this in more detail, but apart from above mentioned demographic
and political background factors for Finnish-language press, it also seems that the discourse on
socialism may have been less con ned in Finnish than in Swedish. Clustering the words with
a cosine similarity to any ism word provides more information about the linguistic contexts of
each ism. Table 2 shows how Finnish-language clusters with associated words includes more
(a) 1880{1889
(b) 1900{1917
religious (and to certain extent also scienti c) terminology than the more political discourse
visible in the Swedish-language clusters, the Finnish-language clusters include a more religious
terminology for the period 1900{1917. Why socialist discourse was more prone to tap into a
reservoir of religious rhetoric in Finnish than in Swedish requires further study.
        </p>
        <p>Both Swedish-language and Finnish-language clusters include separate clusters for
rheumatism (with spelling variations), which are almost self-containing. Rheumatism, albeit an ism
based strictly on spelling, does not cluster with other isms, but has a distinct use in medical
discourse of the time. This shows that our clustering method is e ective, but it is also
indicative of the fact that historical language use made a distinction of di erent types of isms.
Some simply ended with the su x, while others were seen as belonging to groups of other isms.
Rheumatism also stands out as a speci c type of term in the newspaper medium as it was very
often used as a stand alone word in advertisements or lists of illnesses.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Future Work</title>
      <p>
        There are alternative ways to build diachronic embeddings. The recent line of research is aimed
at smooth time representation [
        <xref ref-type="bibr" rid="ref18 ref22 ref3">3, 7, 22, 26</xref>
        ]. These methods reveal gradual semantic changes
over the years instead of dividing the data into discrete time slices. In the future we plan to
utilize one of these methods to investigate semantic drift of ideological terms in more details.
We further aim to explore methods for cross-language cluster comparison. In the case of ism
words, translations between Finnish and Swedish are near at hand as is clear in Figure 2a and
2b, but a proper comparison of the clusters needs further methodological exploration.
      </p>
      <p>For examples see Hufvudstadsbladet, 23.11.1907, nro 320, p. 8; Wiborgs Nyheter, 23.01.1903, nro 18, p. 3;
Uusi Suometar, 04.06.1905, nro 128, p. 8
representatives are marked with italic, isms are highlighted with bold.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We are grateful to Simon Hengchen and Mark Granroth-Wilding for the help with data
preparation. This work has been supported by the European Unions Horizon 2020 research and
innovation programme under grants 770299 (NewsEye) and 825153 (EMBEDDIA).
[1] Domagoj Alagic, Jan Snajder, and Sebastian Pado. Leveraging lexical substitutes for unsupervised
word sense induction. In Thirty-Second AAAI Conference on Arti cial Intelligence, 2018.
[2] Cesare Cuttica. To use or not to use ... the intellectual historian and the isms: A survey and a
proposal. Etudes Episteme, 23, 2013.
[3] Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi, and Dominik Schlechtweg. Time-out:
Temporal referencing for robust modeling of lexical semantic change. In The 57th Annual Meeting
of the Association for Computational Linguistics (ACL), 2019.
[4] Max Engman.</p>
      <p>Sprakfragan:</p>
      <p>Finlandssvenskhetens uppkomst 1812-1922.</p>
      <p>Svenska
litteratursallskapet i Finland, 2016.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Freeden</surname>
          </string-name>
          .
          <article-title>Ideology: A very short introduction</article-title>
          . Oxford University Press,
          <year>2003</year>
          . OCLC:
          <volume>312572349</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Brendan</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Frey and Delbert Dueck</surname>
          </string-name>
          .
          <article-title>Clustering by passing messages between data points</article-title>
          .
          <source>science</source>
          ,
          <volume>315</volume>
          (
          <issue>5814</issue>
          ):
          <volume>972</volume>
          {
          <fpage>976</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Nabeel</given-names>
            <surname>Gillani</surname>
          </string-name>
          and
          <string-name>
            <given-names>Roger</given-names>
            <surname>Levy</surname>
          </string-name>
          .
          <article-title>Simple dynamic word embeddings for mapping perceptions in the public sphere</article-title>
          .
          <source>In NAACL HLT</source>
          <year>2019</year>
          , page
          <volume>94</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Simon</given-names>
            <surname>Hengchen</surname>
          </string-name>
          , Ruben Ros, and
          <string-name>
            <given-names>Jani</given-names>
            <surname>Marjanen</surname>
          </string-name>
          .
          <article-title>A data-driven approach to the changing vocabulary of the nation in English, Dutch, Swedish</article-title>
          and Finnish newspapers,
          <fpage>1750</fpage>
          -
          <lpage>1950</lpage>
          . In
          <source>In Proceedings of the Digital Humanities (DH) conference</source>
          <year>2019</year>
          , Utrecht, The Netherlands,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Ho</surname>
          </string-name>
          p .
          <source>Isms</source>
          .
          <source>British Journal of Political Science</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>17</fpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yoon</surname>
            <given-names>Kim</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yi-I Chiu</surname>
            , Kentaro Hanaki, Darshan Hegde, and
            <given-names>Slav</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          .
          <article-title>Temporal analysis of language through neural language models</article-title>
          .
          <source>ACL</source>
          <year>2014</year>
          , page
          <volume>61</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Jussi</given-names>
            <surname>Kurunma</surname>
          </string-name>
          <article-title>ki and Jani Marjanen. Isms, ideologies and setting the agenda for public debate</article-title>
          .
          <source>Journal of Political Ideologies</source>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <volume>256</volume>
          {
          <fpage>282</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Jussi</given-names>
            <surname>Kurunma</surname>
          </string-name>
          <article-title>ki and Jani Marjanen. A rhetorical view of isms: an introduction</article-title>
          .
          <source>Journal of Political Ideologies</source>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <volume>241</volume>
          {
          <fpage>255</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Andrey</surname>
            <given-names>Kutuzov</given-names>
          </string-name>
          , Elizaveta Kuzmenko, and
          <string-name>
            <given-names>Lidia</given-names>
            <surname>Pivovarova</surname>
          </string-name>
          .
          <article-title>Clustering of Russian adjectivenoun constructions using word embeddings</article-title>
          .
          <source>In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</source>
          , pages
          <volume>3</volume>
          {
          <fpage>13</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Andrey</surname>
            <given-names>Kutuzov</given-names>
          </string-name>
          , Lilja vrelid, Terrence Szymanski, and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Velldal</surname>
          </string-name>
          .
          <article-title>Diachronic word embeddings and semantic shifts: a survey</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <volume>1384</volume>
          {
          <fpage>1397</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [15] Eetu Makela.
          <article-title>Las: an integrated language analysis tool for multiple languages</article-title>
          .
          <source>The Journal of Open Source Software</source>
          ,
          <volume>1</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jani</surname>
            <given-names>Marjanen</given-names>
          </string-name>
          , Ville Vaara, Antti Kanner, Hege Roivainen, Eetu Makela, Leo Lahti, and
          <string-name>
            <given-names>Mikko</given-names>
            <surname>Tolonen</surname>
          </string-name>
          .
          <article-title>A national public sphere? analyzing the language, location, and form of newspapers in nland</article-title>
          ,
          <fpage>1771</fpage>
          -
          <lpage>1917</lpage>
          .
          <source>Journal of European Periodical Studies</source>
          ,
          <year>2019</year>
          (forthcoming).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Wiktor</given-names>
            <surname>Marzec</surname>
          </string-name>
          and
          <string-name>
            <given-names>Risto</given-names>
            <surname>Turunen</surname>
          </string-name>
          .
          <article-title>Socialisms in the Tsarist Borderlands</article-title>
          . Contributions to the
          <source>History of Concepts</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>22</volume>
          {
          <fpage>50</fpage>
          ,
          <year>June 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg S Corrado, and
          <article-title>Je rey Dean. E cient estimation of word representations in vector space</article-title>
          .
          <source>In NIPS</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Tuula</surname>
            <given-names>Pa</given-names>
          </string-name>
          <article-title>akkonen, Jukka Kervinen</article-title>
          , Asko Nivala, Kimmo Kettunen, and Eetu Makela.
          <article-title>Exporting Finnish digitized historical newspaper contents for o ine use</article-title>
          .
          <string-name>
            <surname>D-Lib</surname>
            <given-names>Magazine</given-names>
          </string-name>
          ,
          <volume>22</volume>
          (
          <issue>7</issue>
          /8),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          {
          <fpage>2830</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Radim</given-names>
            <surname>Rehurek</surname>
          </string-name>
          and
          <string-name>
            <given-names>Petr</given-names>
            <surname>Sojka</surname>
          </string-name>
          .
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          , pages
          <volume>45</volume>
          {
          <fpage>50</fpage>
          ,
          <string-name>
            <surname>Valletta</surname>
          </string-name>
          , Malta, May
          <year>2010</year>
          . ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Rosenfeld</surname>
          </string-name>
          and
          <string-name>
            <given-names>Katrin</given-names>
            <surname>Erk</surname>
          </string-name>
          .
          <article-title>Deep neural models of semantic shift</article-title>
          .
          <source>In NAACL HLT</source>
          <year>2018</year>
          , pages
          <fpage>474</fpage>
          {
          <fpage>484</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Magnus</given-names>
            <surname>Sahlgren</surname>
          </string-name>
          .
          <article-title>The distributional hypothesis</article-title>
          .
          <source>Italian Journal of Linguistics</source>
          ,
          <volume>20</volume>
          :
          <fpage>33</fpage>
          {
          <fpage>53</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Ivo</given-names>
            <surname>Spira</surname>
          </string-name>
          .
          <article-title>A conceptual history of Chinese -isms: The modernization of ideological discourse,</article-title>
          <year>1895</year>
          -
          <fpage>1925</fpage>
          .
          <article-title>Number Volume 4 in Conceptual history and Chinese linguistics</article-title>
          .
          <source>Brill</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Nina</surname>
            <given-names>Tahmasebi</given-names>
          </string-name>
          , Lars Borin, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <article-title>Survey of computational approaches to diachronic conceptual change</article-title>
          .
          <source>arXiv preprint arXiv:1811.06278</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Zijun</surname>
            <given-names>Yao</given-names>
          </string-name>
          , Yifan Sun, Weicong Ding,
          <string-name>
            <given-names>Nikhil</given-names>
            <surname>Rao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hui</given-names>
            <surname>Xiong</surname>
          </string-name>
          .
          <article-title>Dynamic word embeddings for evolving semantic discovery</article-title>
          .
          <source>In The 11th ACM Conference on Web Search and Data Mining</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>