<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Embeddings Shifts as Proxies for Different Word Use in Italian Newspapers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michele Cafagna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo De Mattei</string-name>
          <email>florenzo.dematteig@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malvina Nissim</string-name>
          <email>m.nissimg@rug.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ItaliaNLP Lab, ILC-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Groningen</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We study how words are used differently in two Italian newspapers at opposite ends of the political spectrum by training embeddings on one newspaper's corpus, updating the weights on the second one, and observing vector shifts. We run two types of analysis, one top-down, based on a preselection of frequent words in both newspapers, and one bottom-up, on the basis of a combination of the observed shifts and relative and absolute frequency. The analysis is specific to this data, but the method can serve as a blueprint for similar studies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Different newspapers, especially if positioned at
opposite ends of the political spectrum, can render
the same event in different ways. In Example (1),
both headlines are about the leader of the
Italian political movement “Cinque Stelle” splitting
up with his girlfriend, but the Italian left-oriented
newspaper la Repubblica1 (rep in the examples)
and right-oriented Il Giornale2 (gio in the
examples) describe the news quite differently. The
news in Example (2), which is about a baby-sitter
killing a child in Moscow, is also reported by the
two newspapers mentioning and stressing different
aspects of the same event.</p>
      <p>Copyright c 2019 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
1https://www.repubblica.it
2http://www.ilgiornale.it
gio Luigino single, e` finita la Melodia</p>
      <p>[en: Luigino single, the Melody is over]
(2) rep Mosca, “la baby sitter omicida non ha agito da
sola”
[en: Moscow, “the killer baby-sitter has not acted
alone”]
gio Mosca, la donna killer: “Ho decapitato la bimba
perche´ me l’ha ordinato Allah”
[en: Moscow, the killer woman: “I have beheaded
the child because Allah has ordered me to do it”]
Often though, the same words are used, but with
distinct nuances, or in combination with other,
different words, as in Examples (3)–(4):
(3) rep Usa: agente uccide un nero disarmato e
immobilizzato
[en: Usa: policeman kills an unarmed and
immobilised black guy]
gio Oklahoma, poliziotto uccide un nero disarmato:
“Ho sbagliato pistola”
[en: Oklahoma: policeman kills an unarmed black
guy: “I used the wrong gun”]
(4) rep Corte Sudan annulla condanna, Meriam torna
libera
[en: Sudan Court cancels the sentence, Meriam is
free again]
gio Sudan, Meriam e` libera: non sara` impiccata perche´
cristiana
[en: Sudan: Meriam is free: she won’t be hanged
because Christian]
In this work we discuss a method to study how the
same words are used differently in two sources,
exploiting vector shifts in embedding spaces.</p>
      <p>
        The two embeddings models built on data
coming from la Repubblica and Il Giornale might
contain interesting differences, but since they are
separate spaces they are not directly comparable.
Previous work has encountered this issue from
a diachronic perspective: when studying
meaning shift in time, embeddings built on data from
different periods would encode different usages,
but they need to be comparable. Instead of
constructing separate spaces and then aligning them
        <xref ref-type="bibr" rid="ref2 ref3">(Hamilton et al., 2016b)</xref>
        , we adopt the method
used by Kim et al. (2014) and subsequently by Del
Tredici et al. (2016) for Italian, whereby
embeddings are first trained on a corpus, and then
updated with a new one; observing the shifts certain
words undergo through the update is a rather
successful method to proxy meaning change.
      </p>
      <p>Rather than across time, we update
embeddings across sources which are identical in genre
(newspapers) but different in political positioning.
Specifically, we train embeddings on articles
coming from the newspaper La Repubblica (leaning
left) and update them using articles coming from
the newspaper Il Giornale (leaning right). We take
the observed shift of a given word (or the shift in
distance between two words) as a proxy for a
difference in usage of that term, running two types
of analysis. One is top-down, and focuses on a
set of specific words which are frequent in both
corpora. The other one is bottom-up, focusing on
words that result potentially interesting on the
basis of measures that combine the observed shift
with both relative and absolute frequency. As a
byproduct, we also learn something about the
interaction of shifts and frequency.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>We scraped articles from the online sites of the
Italian newspapers la Repubblica, and Il Giornale.
We concatenated each article to its headline, and
obtained a total of 276,120 documents (202,419
for Il Giornale and 73,701 for la Repubblica).</p>
      <p>For training the two word embeddings, though,
we only used a selection of the data. Since we are
interested in studying how the usage of the same
words changes across the two newspapers, we
wanted to maximise the chance of articles from the
two newspapers being on the same topic. Thus, we
implemented an automatic alignment, and retained
only the aligned news for each of the two corpora.
All embeddings are trained on such aligned news.
2.1</p>
      <sec id="sec-2-1">
        <title>Alignment</title>
        <p>We align the two datasets using the whole body of
the articles. We compute the tf-idf vectors for all
the articles of both newspapers and create subsets
of relevant news filtering by date, i.e.
considering only news that were published in the range of
three days before and after of one another. Once
this subset is extracted, we compute cosine
similarities for all news in one corpus and in the other
corpus using the tf-idf vectors, we rank them and
then filter out alignments whose cosine similarity
is under a certain threshold. The threshold should
be chosen taking into consideration a trade-off
between keeping a sufficient number of documents
and quality of alignment. In this case, we are
relatively happy with a good but not too strict
alignment, and after a few tests and manual checks, we
found that threshold of 0.185 works well in
practice for these datasets, yielding a good balance
between correct alignments and news recall. Table 1
shows the size of the aligned corpus in terms of
number of documents and tokens.</p>
        <p>newspaper
la Repubblica
Il Giornale
#documents
If we look at the most frequent content words in
the datasets (Figure 1), we see that they are indeed
very similar, most likely due to the datasets being
aligned based on lexical overlap.</p>
        <p>This selection of frequent words already
constitutes a set of interesting tokens to study for their
potential usage shift across the two newspapers.
In addition, through the updating procedure that
we describe in the next section, we will be able to
identify which words appear to undergo the
heaviest shifts from the original to the updated space,
possibly indicating a substantial difference of use
across the two newspapers.
Seeing that frequent words are shared across the
two datasets, we want to ensure that the two
datasets are still different enough to make the
embeddings update meaningful.</p>
        <p>
          We therefore run a simple classification
experiment to assess how distinguishable the two
sources are based on lexical features. Using the
scikit-learn implementation with default
parameters
          <xref ref-type="bibr" rid="ref7">(Pedregosa et al., 2011)</xref>
          , we trained a binary
linear SVM to predict whether a given document
comes from la Repubblica or Il Giornale. We used
ten-fold cross-validation over the aligned dataset
with only word n-grams 1-2 as features and
obtained an overall accuracy of 0.796, and 0.794 and
0.797 average precision and recall, respectively.
This is indicative that the two newspapers can be
distinguished even when writing about the same
topics. Looking at predictive features we can
indeed see some words that might be characterising
each of the newspapers due to their higher tf-idf
weight, thus maintaining distinctive context even
in similar topics and with frequent shared words.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Embeddings and Measures</title>
      <p>
        We train embeddings on one source, and update
the weights training on the other source.
Specifically, using the gensim library
        <xref ref-type="bibr" rid="ref11 ref8">(Rˇ ehu˚rˇek and
Sojka, 2010)</xref>
        , first we train a word2vec model
        <xref ref-type="bibr" rid="ref6">(Mikolov et al., 2013)</xref>
        to learn 128 sized vectors on
la Repubblica corpus (using the skip-gram model,
window size of 5, high-frequency word
downsample rate of 1e-4, learning rate of 0.05 and
minimum word frequency 3, for 15 iterations). We
call these word embeddings spaceR. Next, we
update spaceR on the documents of Il Giornale with
identical settings but for 5 iterations rather than 15.
The resulting space, spaceRG, has a total
vocabulary size of 53,684 words. We decided to go this
direction (rather than train on Il Giornale first and
update on La Repubblica later because the La
Repubblica corpus is larger in terms of tokens, thus
ensuring a more stable space to start from.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Quantifying the shift</title>
        <p>
          This procedure makes it possible to observe the
shift of any given word, both quantitatively as well
as qualitatively. This is more powerful than
building two separate spaces and just check the nearest
neighbours of a selection of words. In the same
way that the distance between two words is
approximated by the cosine distance of their vectors
          <xref ref-type="bibr" rid="ref11 ref8">(Turney and Pantel, 2010)</xref>
          , we calculate the
distance between a word in spaceR and the same
word in spaceRG, by taking the norm of the
difference between the vectors. This value for word
w is referred to as shif tw. The higher shif tw, the
larger the difference in usage of w across the two
spaces. We observe an average shift of 1.98, with
the highest value at 6.65.
By looking at raw shifts, selecting high ones,
we could see some potentially interesting words.
However, frequency plays an important role, too
          <xref ref-type="bibr" rid="ref10">(Schnabel et al., 2015)</xref>
          . To account for this, we
explore the impact of both absolute and relative
frequency for each word w. We take the overall
frequency of a word summing the individual
occurrences of w in the two corpora (totalw). We
also take the difference between the relative
frequency of a word in the two corpora, as this might
be influencing the shift. We refer to this difference
as gapw, and calculate it as in Equation 1.
(1)
gapw = log(
f reqwr )
jrj
log(
f reqwg )
jgj
A negative gapw indicates that the word is
relatively more frequent in Il Giornale than in la
Repubblica, while a positive value indicates the
opposite. Words whose relative frequency is similar
in both corpora exhibit values around 0.
        </p>
        <p>We observe a tiny but significant negative
correlation between totalw and shif tw (-0.093, p &lt;
0:0001), indicating that the more frequent a word,
the less it is likely to shift. In Figure 2 we see all
the dark dots (most frequent words) concentrated
at the bottom of the scatter plot (lower shifts).</p>
        <p>However, when we consider gapw and shif tw,
we see a more substantial negative correlation
(0.306, p &lt; 0:0001), suggesting that the gap has an
influence on the shift: the more negative the gap,
the higher the shift. In other words, the shift is
larger if a word is relatively more frequent in the
corpus used to update the embeddings.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Analysis</title>
      <p>We use the information that derives from having
the original spaceR and the updated spaceRG to
carry out two types of analysis. The first one is
top-down, with a pre-selection of words to study,
while the second one is bottom-up, based on
measures combining the shift and frequency.
4.1</p>
      <sec id="sec-4-1">
        <title>Top-down</title>
        <p>As a first analysis, we look into the most frequent
words in both newspapers and study how their
relationships change when we move from spaceR to
spaceRG. The words we analyse are the union of
those reported in Figure 1. Note that in this
analysis we look at pairs of words at once, rather than
at the shift of a single word from one space to the
next. We build three matrices to visualise the
distance between these words.</p>
        <p>The first matrix (Figure 3) only considers
SpaceR, and serves to show how close/distant the
words are from one another in la Repubblica. For
example, we see that “partito” and “Pd”, or
“premier” and “Renzi” are close (dark-painted), while
“polizia” and “europa” are lighter, thus more
distant (probably used in different contexts).</p>
        <p>In Figure 4 we show a replica of the first
matrix, but now on SpaceRG; this matrix now let’s
us see how the distance between pairs of words has
changed after updating the weights. Some vectors
are farther than before and this is visible by the
ligther color of the figure, like “usa” and “lega”
or “italia” and “usa”, while some words are closer
like “Berlusconi” and “europa” or “europa” and
“politica” which feature darker colour. Specific
analysis of the co-occurrences of such words could
yield interesting observations on their use in the
two newspapers.</p>
        <p>In order to better observe the actual difference,
the third matrix shows the shift from spaceR to
spaceRG, normalised by the logarithm of the
absolute difference between the totalw1 and totalw2
(Figure 5).3 Lighter word-pairs shifted more, thus
suggesting different contexts and usage, for
example “italia” and “lega”. Darker pairs, on the other
hand, such as “Pd”-“Partito” are also interesting
for deeper analysis, since their joint usage is likely
to be quite similar in both newspapers.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Bottom-up</title>
        <p>Differently from what we did in the top-down
analysis, here we do not look at how the
relationship between pairs of pre-selected words changes,
rather at how a single word’s usage varies across
the two spaces. These words arise from the
interaction of gap and shif t, which yields various
scenarios. Words with a large negative gap
(relative frequency higher in Il Giornale) are likely to
shift more, but it’s probably more of an effect due
to increased frequency than a genuine shift. Words
that have a high gap (occurring relatively less in Il
Giornale) are likely to shift less, most likely since
adding a few contexts might not cause much shift.</p>
        <p>
          The most interesting cases are words whose
3Note that this does not correspond exactly to the gap
measure in Eq. 1 since we are considering the difference
between two words rather than the difference in occurrence of
the same word in the two corpora.
relative frequency does not change in the two
datasets, but have a high shift. Zooming in on the
words that have small gaps ( 0:1 &lt; gapw &lt; 0:1),
will provide us with a set of potentially
interesting words, especially if they have a shift higher
than the average shift. We also require that words
obeying the previous constraints occur more than
the average word frequency over the two corpora.
Low frequency words are in general less stable
          <xref ref-type="bibr" rid="ref10">(Schnabel et al., 2015)</xref>
          , suggesting that shifts for
the latter might not be reliable. High frequency
words shift globally less (cf. Figure 2), so a higher
than average shift could be meaningful.
        </p>
        <p>Figure 6 shows the plot of words that have
more or less the same relative frequency in the
two newspapers ( 0:1 &lt; gap &gt; 0:1 and an
absolute cumulative frequency higher than average),
and we therefore infer that their higher than
average shift is mainly due to usage difference. Some
comments are provided next to the plot.</p>
        <p>These words can be the focus of a dedicated
study, and independently of the specific
observations that we can make in this context, this method
can serve as a way to highlight the hotspot words
that deserve attention in a meaning shift study.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>A closer look at nearest neighbours</title>
        <p>As a last, more qualitative, analysis, one can
inspect how the nearest neighbours of a given word
of interest change from one space to the next. In
our specific case, we picked a few words
(deriving them from the top-down, thus most frequent,
and bottom-up selections), and report in Table 2
their top five nearest neighbours in SpaceR and in
SpaceRG. As in most analyses of this kind, one
has to rely quite a bit on background and general
knowledge to interpret the changes. If we look at
“Renzi”, for example, a past Prime Minister from
the party close to the newspaper “la Repubblica”,
we see that while in SpaceR the top neighbours
are all members of his own party, and the party
itself (“Pd”), in SpaceRG politicians from other
parties (closer to “Il Giornale”) get closer to Renzi,
such as Berlusconi and Alfano.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We experimented with using embeddings shifts as
a tool to study how words are used in two different
Italian newspapers. We focused on a pre-selection
of high frequency words shared by the two
newspapers, and on another set of words which were</p>
      <p>SpaceR SpaceRG</p>
      <p>“migranti” [en: migrants]
barconi [large boats] (0.60) eritrei [Eritreans] (0.61)
naufraghi [castaways] (0.57) Lampedusa [] (0.60)
disperati [wretches] (0.56) accoglienza [hospitality] (0.59)
barcone [large boat] (0.55) Pozzallo [] (0.58)
carrette [wrecks] (0.53) extracomunitari [non-European] (0.57)
“Renzi ” [past Prime Minister]
Orfini [] (0.65) premier [] (0.60)
Letta [] (0.64) Nazareno [] (0.59)
Cuperlo [] (0.63) Berlusconi [] (0.58)
Pd [] (0.62) Cav [] (0.57)
Bersani [] (0.61) Alfano [] (0.56)</p>
      <p>“politica ” [en: politics]
leadership [] (0.65) tecnocrazia [technocracy] (0.60)
logica [logic] (0.64) democrazia [democracy] (0.59)
miri [aspire to] (0.63) partitica [of party] (0.58)
ambizione [ambition] (0.62) democratica [democratic] (0.57)
potentati [potentates] (0.61) legalita` [legality] (0.56)
highlighted as potentially interesting through a
newly proposed methodology which combines
observed embeddings shifts and relative and absolute
frequency. Most differently used words in the two
newspapers are proper nouns of politically active
individuals as well as places, and concepts that are
highly debated on the political scene.</p>
      <p>Beside the present showcase, we believe this
methodology can be more in general used to
highlight which words might deserve deeper, dedicated
analysis when studying meaning change.</p>
      <p>
        One aspect that should be further investigated
is the role played by the methodology used for
aligning and/or updating the embeddings. As an
alternative to what we proposed, one could
employ different strategies to manipulate embedding
spaces towards highlighting meaning changes. For
example, Rodda et al. (2016) exploited
Representational Similarity Analysis
        <xref ref-type="bibr" rid="ref5">(Kriegeskorte and
Kievit, 2013)</xref>
        to compare embeddings built on
different spaces in the context of studying diachronic
semantic shifts in ancient Greek. Another
interesting approach, still in the context of diachronic
meaning change, but applicable to our datasets,
was introduced by Hamilton et al. (2016a), who
use both a global and a local neighborhood
measure of semantic change to disentangle shifts due
to cultural changes from purely linguistic ones.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thank the Center for
Information Technology of the University of Groningen
for providing access to the Peregrine high
performance computing cluster.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Marco Del Tredici</surname>
            ,
            <given-names>Malvina</given-names>
          </string-name>
          <string-name>
            <surname>Nissim</surname>
            , and
            <given-names>Andrea</given-names>
          </string-name>
          <string-name>
            <surname>Zaninello</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Tracing metaphors in time through self-distance in vector spaces</article-title>
          .
          <source>In Proceedings of the Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>William L Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jure Leskovec</surname>
            , and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          . 2016a.
          <article-title>Cultural shift or linguistic drift? comparing two computational measures of semantic change</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing</source>
          , volume
          <volume>2016</volume>
          ,
          <article-title>page 2116</article-title>
          . NIH Public Access.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>William L Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jure Leskovec</surname>
            , and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          . 2016b.
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1489</fpage>
          -
          <lpage>1501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yi-I Chiu</surname>
            , Kentaro Hanaki, Darshan Hegde, and
            <given-names>Slav</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Temporal analysis of language through neural language models</article-title>
          .
          <source>In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>65</lpage>
          , Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Kriegeskorte and Rogier A Kievit</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Representational geometry: integrating cognition, computation, and the brain</article-title>
          .
          <source>Trends in cognitive sciences</source>
          ,
          <volume>17</volume>
          (
          <issue>8</issue>
          ):
          <fpage>401</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>s Corrado, and</article-title>
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>Proceedings of Workshop at ICLR</source>
          ,
          <year>2013</year>
          ,
          <volume>01</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Radim</surname>
            <given-names>Rˇehu</given-names>
          </string-name>
          ˚rˇek and
          <string-name>
            <given-names>Petr</given-names>
            <surname>Sojka</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          , Valletta, Malta, May. ELRA. http://is. muni.cz/publication/884893/en.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Martina</given-names>
            <surname>Astrid</surname>
          </string-name>
          <string-name>
            <surname>Rodda</surname>
          </string-name>
          ,
          <source>Marco SG Senaldi, and Alessandro Lenci</source>
          .
          <year>2016</year>
          .
          <article-title>Panta rei: Tracking semantic change with distributional semantics in ancient greek</article-title>
          . In CLiC-it/EVALITA.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          , Igor Labutov, David Mimno,
          <string-name>
            <given-names>and Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Evaluation methods for unsupervised word embeddings</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>298</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Peter D Turney</surname>
            and
            <given-names>Patrick</given-names>
          </string-name>
          <string-name>
            <surname>Pantel</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>From frequency to meaning: Vector space models of semantics</article-title>
          .
          <source>Journal of artificial intelligence research</source>
          ,
          <volume>37</volume>
          :
          <fpage>141</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>