<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic representation of Slovak words</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Šimon Horvát</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanislav Krajcˇi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L'ubomír Antoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, Pavol Jozef Šafárik University in Košice</institution>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The first problem one encounters when trying to apply analytical methods to text data is probably that of how to represent it in a way that is amenable to operations such as similarity, composition, etc. Recent methods for learning vector space representations of words have succeeded in capturing semantic using vector arithmetic, however, all of these methods need a lot of text data for representation learning. In this paper, we focus on Slovak words representation that captures semantic information, but as the data source, we use a dictionary, since the public corpus of the required size is not available. The main idea is to represent information from the dictionary as a word network and learn a mapping of nodes to a low-dimensional space of features that maximize the likelihood of preserving network neighbourhoods of word nodes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Recently, the area of natural language processing (NLP)
passed reform. Even this area did not escape the strong
influence of the rise of neural networks. In most NLP
classical tasks, such as text classification, machine translation,
sentiment analysis, good results are achieved because of
deep learning-based representation of fundamental
building language components – words. Neural networks use
large corpora for word representation learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>However, in some languages, it is not possible to use
this approach because it does not exist large amounts of
unstructured text data in them. These issues can be well
illustrated by Google translations in some not widely
spoken languages (see Figure 1). The reason for poor
translations is the absence of a large public source of text data.</p>
      <p>In this paper, we propose our approach, which aims to
obtain the semantic representation of words based on
public dictionaries instead of the corpus. The main idea is
to construct graph G = (V; E; f ) where vertices are words,
which we want to get vector representation from and edges
are relationships between words. It means the words that
are closely related will be connected by an edge. The
intensity of this relationship is expressed by weight – the
real number from 0 to 1. As our source of data for
building graph G, we used dictionaries. We will describe details
in the section Data processing. We use tf-idf statistics
for weighting our word network. After building graph G
with mentioned properties, we apply well-known feature
learning algorithm Node2Vec for networks. This method
interprets nodes as vectors that are suitable for use as word
vectors with semantic properties.</p>
      <p>We focus on the process of Slovak words to increase
the automated processing (NLP) of Slovak language, but
it can be used for any other language. The paper is
organized as follows: Section 2 – Related works, we present
the basic definitions, and briefly describes methods from
related works. Section 3 – Data processing, explores our
data sources and their processing for the usage of our
method. In Section 4 – Methods, We propose a novel
approach, which produces dense vector representations of
words with semantic properties. Some results are shown
in section 5 – Experiments, and key takeaway ideas are
discussed in the last section 6 – Conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>Semantic vector space models of language represent each
word with a real-valued vector. These representations are
now commonly called word embeddings. The vectors can
be used as features in a variety of applications as stated in
the previous chapter. Distance between every two vectors
should reflect how closely relate the meaning of the words,
in ideal semantic vector space. The goal is to achieve an
approximation of this vector space.</p>
      <p>
        Word embeddings are commonly ([
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ])
categorized into two types, depending upon the strategies used
to induce them. Methods that leverage local data (e.g. a
word’s context) are called prediction-based models and are
generally reminiscent of neural language models. On the
other hand, methods that use global information, generally
corpus-wide statistics such as word counts and frequencies
are called count-based models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Both types of models have their advantages and
disadvantages. However, a significant drawback of both
approaches for not widely spoken language is the need for
a large corpus. In the following sections, we show how
can be this problem solved.</p>
      <p>
        Prediction-based models. The idea of this approach is to
learn word representations that aid in making predictions
within local context windows (Figure 2). For example,
Mikolov et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have introduced two models for
learning embeddings, namely the continuous bag-of-words
(CBOW) and skip-gram (SG) models (Figure 3). The
main difference between CBOW and SG lies in the loss
function used to update the model, while CBOW trains
a model that aims to predict the center word based upon
its context, in SG the roles are reversed, and the center
word is, instead, used to predict each word appearing in
its context.
      </p>
      <p>
        Count-based models. These models are another way
of producing word embeddings, not by training
algorithms that predict the next word given its context but by
leveraging word-context cooccurence counts globally in
a corpus. These are very often represented (Turney and
Pantel (2010) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]) as word-context matrices. The earliest
relevant example of leveraging word-context matrices to
produce word embeddings is Latent Semantic Analysis
(LSA) (Deerwester et al. (1990) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) where Singular
value decomposition (SVD) is applied [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data processing</title>
      <p>
        As we already mentioned, we use dictionaries as our data
source instead of corpora. First at all, we find web page,
which contains dictionaries with public access for pulling
data out of HTML (it is also possible to use dictionaries in
text format). We parse two types of Dictionary:
1. Synonym dictionary [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
2. classic dictionary [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that contains a list of words
and their meaning.
      </p>
      <p>First, we establish some notation. Let VOCAB be a set
of all words that we want to represent as a vector.</p>
      <p>
        Let S represent the set of all synonym pairs achieved
from the Synonym dictionary [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Set S contains pairs
like
(vtipny´; za´bavny´); (vtipny´; smiesˇny); (ry´chlo; chytro) : : :
It is important to remark that not every word from
VOCAB has synonym pair. Let L represent the set of
word pairs from the Dictionary [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>We create these word pairs (w; li) as follow:</p>
      <p>
        For word w from VOCAB, we find its definition from
the Dictionary [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Subsequently, we find the lemma of each word
occurring in this definition of w. Let denote these
lemmas as l1; l2; : : : ; ln. For each word w from VOCAB,
there are pairs (w; l1); (w; l2); : : : ; (w; ln) in set L. For
instance, word slnko has definition: “Nebeské teleso
vysielajúce do vesmíru teplo a svetlo.” Based on that,
we add to set L these pairs: (slnko, nebeský), (slnko,
teleso), (slnko, vysielajúce), (slnko, vesmír), (slnko,
teplo), (slnko, svetlo).</p>
      <p>
        We used a rule-based tool for lemitization [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Let G = (V; E; f ) to be denoted by a directed graph
where V = VOCAB, edges E = S [ L and f is the
function that for each edge e from E assign real number f (e).
We will define function f in section 4.1.</p>
      <p>From now, our initial task of word representation
learning is transformed into a graph-mining problem.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <p>
        In this section, we present the tf-idf method and
Node2Vec algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
4.1 tf-idf
The notion tf-idf stands for term frequency-inverse
document frequency, and the tf-idf weight is a weight often
used in information retrieval and text mining. This weight
is a statistical measure used to evaluate how important a
word is to a document in a collection or corpus. The
importance increases proportionally to the number of times
a word appears in the document but is offset by the
frequency of the word in the corpus. Variations of the tf-idf
weighting scheme are often used by search engines as a
central tool in scoring and ranking a document’s relevance
given a user query. The tf-idf can be successfully used
for stop-words filtering in various subject fields, including
text summarization and classification. The tf-idf is the
product of two statistics, term frequency and inverse
document frequency [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Term frequency. Suppose we have a set of English text
documents and wish to rank which document is most
relevant to the query, "the brown cow". A simple way to
start out is by eliminating documents that do not contain
all three words "the", "brown", and "cow", but this still
leaves many documents. To further distinguish them, we
might count the number of times each term occurs in each
document; the number of times a term occurs in a
document is called its term frequency. In the case of the term
frequency tf(t; d), the simplest choice is to use the raw
count of a term in a document, i.e., the number of times
that term t occurs in document d.</p>
      <p>Inverse document frequency. Because the term "the" is so
common, term frequency will tend to incorrectly
emphasize documents that happen to use the word "the" more
frequently, without giving enough weight to the more
meaningful terms "brown" and "cow". The term "the"
is not a good keyword to distinguish relevant and
nonrelevant documents and terms, unlike the less-common
words "brown" and "cow". Hence an inverse document
frequency factor is incorporated which diminishes the
weight of terms that occur very frequently in the document
set and increases the weight of terms that occur rarely. So
the inverse document frequency is a measure of how much
information the word provides, i.e., if it’s common or rare
across all documents. It is the logarithmically scaled
inverse fraction of the documents that contain the word
(obtained by dividing the total number of documents by the
number of documents containing the term, and then taking
the logarithm of that quotient):
idf(t; D) = log</p>
      <p>N
jfd 2 D : t 2 dgj
(1)
with</p>
      <p>N: total number of documents in the corpus N = jDj
jfd 2 D : t 2 dgj number of documents where the term
t appears.</p>
      <p>Term frequency–Inverse document frequency. tf-idf is
calculated as</p>
      <p>tf-idf(t; d; D) = tf(t; d) idf(t; D)
A high weight in tf-idf is reached by a high term
frequency (in the given document) and a low document
frequency of the term in the whole collection of documents;
the weights hence tend to filter out common terms. Since
the ratio inside the idf’s log function is always greater
than or equal to 1, the value of idf (and tf-idf) is greater
than or equal to 0. As a term appears in more documents,
the ratio inside the logarithm approaches 1, bringing the
idf and tf-idf closer to 0.
tf-idf as a weight function. Let’s consider word železný
and its definition “obsahujúci železo; majúci istý vzt’ah
k železu”. For Slovak readers, it is obvious not every
word of the definition is related to the word železný in the
same way. Some words are very important for the
construction of definition (obsahujúci, mat’, istý) but they are
not related to the defined word. By the definition of our
word network G, all lemmas will be joined with the word
železný by an edge, but we can filter unrelated words by
assigning them low weight.</p>
      <p>tf(t; d) the number of times that word t occurs in
definition d. For instance, tf(železo, “obsahujúci
železo; mat’ istý vzt’ah železo”) = 2.
idf(t; D) is inverse document frequency defined as
(1), where D is set of all definitions from the
Dictionary,
– N is total number of definitions,
– and jfd 2 D : t 2 dgj is number of definitions
where the word t appears.</p>
      <p>The definition implies that often appearing words in
definitions (such as "majúci" or "nejaký") have a low idf
value. So the relationship between words w and li (lemma
of i-th word that appears in definition dw of word w) is
given by value tf-idf(w; li) = tf(li; dw) idf(li; D):
tf-idf is our weight function if edge e join word w1
and word w2, where w2 is the lemma of a word that appears
in definition dw1 of word w, in other words, if e from L. If
edge e join synonyms words (e 2 S), the weight of e is 1 –
a max weight value. If e belongs L but also e belongs S,
f (e) = 1.</p>
      <p>f (e) = f (w1; w2) =
8&gt;1;
&lt;</p>
      <p>tf-idf(w1; w2); if e 2 L
&gt;:1;
if e 2 S \ L
if e 2 S
4.2</p>
      <p>
        Node2Vec
In previous sections, we have described building graph
G = (V; E; f ) that captures the semantic relationships
between words. Finally, we need to obtain a vector
representation of each node of a graph. We use the Node2Vec
algorithm for this purpose [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The Node2Vec
framework learns low-dimensional representations for nodes in
a graph through the use of random walks. Given any
graph, it can learn continuous feature representations for
the nodes, which can then be used for various downstream
machine learning tasks. Node2Vec follows the intuition
that random walks through a graph can be treated like
sentences in a corpus (sampling strategy). Each node in
a graph is treated like an individual word, and a random
walk is treated as a sentence. When we have a sufficiently
large corpus obtained by random walks through the graph,
the next step of the algorithm is to use the traditional
embedding technique to obtain vector representation (see
Figure 4), in the concrete, Node2Vec use mentioned
skipgram model (Figure 3). Node2Vec algorithm works in 2
steps: sampling strategy and feeding the skip-gram model.
Since we already mention skip-gram model, we will focus
on the sampling strategy.
      </p>
      <p>Node2Vec’s sampling strategy, accepts four arguments:
Number of walks n: Number of random walks to be
generated from each node in the graph,
walk length l: how many nodes are in each random
walk,
p: return hyperparameter,
q: in-out hyperparameter.</p>
      <p>The first two hyperparameters are self-explanatory. The
algorithm for the random walk generation will go over
each node in the graph and will generate n random walks,
of length l.</p>
      <p>Return parameter p controls the likelihood of
immediately revisiting a node in the walk. Setting it to a high
value ensures that we are less likely to sample an already
visited node in the following two steps (unless the next
node in the walk had no other neighbor). This strategy
encourages moderate exploration and avoids 2-hop
redundancy in sampling. On the other hand, if p is low, it would
lead the walk to backtrack a step and this would keep the
walk "local" close to the starting node u.</p>
      <p>
        In-out parameter q allows the search to differentiate
between "inward" and "outward" nodes. Going back to
Figure 5, if q &gt; 1, the random walk is biased towards nodes
close to node t. Such walks obtain a local view of the
underlying graph with respect to the start node in the walk
and approximate BFS (Breadth First Search Traversal)
behavior in the sense that our samples comprise of nodes
within a small locality. In contrast, if q &lt; 1, the walk is
more inclined to visit nodes that are further away from the
node t. Such behavior is reflective of DFS (Deph First
Search Traversal) which encourages outward exploration.
However, an essential difference here is that we achieve
DFS-like exploration within the random walk framework.
Hence, the sampled nodes are not at strictly increasing
distances from a given source node u, but in turn, we benefit
from tractable preprocessing and superior sampling
efficiency of random walks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>For weighted graphs (our case), the weight of the edge
has an impact on the probability of node visiting (higher
weight - the higher probability of visiting).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>The word similarity measure is one of the most frequently
used approaches to validate word vector representations.
The word similarity evaluator correlates the distance
between word vectors and human perceived semantic
similarity. The goal is to measure how well the notion of
human perceived similarity is captured by the word vector
representations.</p>
      <p>One commonly used evaluator is the cosine similarity
defined by
cos(wx; wy) =</p>
      <p>wx wy
kwxk kwyk
;
where wx and wy are two word vectors and kwxk and kwyk
are the L2 norm.</p>
      <p>This test computes the correlation between all vector
dimensions, independent of their relevance for a given word
pair or a semantic cluster. Many datasets are created for
word similarity evaluation, unfortunately, there is no this
kind of dataset for the Slovak language. In Table 1, we
present the several words and list of their 20 nearest words,
which are the results of our proposed model. We use
cosine similarity as our distance metric and following setting
of Node2Vec algorithm:</p>
      <p>Number of walks n: 20,
walk length l: 100,
p: 10,
q: 1.</p>
      <p>Several words in Table 1 have multiple meaning. For
example, kobylka has 3 meanings:
1. miniature of a mare, female horse,
2. meadow jumping insect,
3. a string-supporting component of musical
instruments.
In this work, we offer a solution to the problem of a lack of
text data for building word embedding. As a data source,
we use a dictionary instead of a corpus. From the data, we
have constructed the word network in which we transform
each node into the vector space. In the section
Experiments, we show that word vectors capture semantic
information what is the main idea behind of word embedding.
In addition, we have presented that vector space captures
more senses for multiple meaning words.</p>
      <p>As a possible extension of this work is to enrich our
vector space with grammatical information too (vectors of
adjectives will be closer to each other than vectors of
adjective and verb). As we already mentioned, our graph
contains only word lemmas, but it is also possible to add
different shapes of a word into vector space.</p>
      <p>In addition, we have presented that vector space is an
appropriate representation for multiple meaning words.
:
1
ta ený iý</p>
      <p>v
s n tl</p>
      <p>a
u j’ i</p>
      <p>m
u t t
a
l
a jú se
s
t’ u e
š
e
v
z
m v
ˇca lo ý
d
i
v e
o ls
’
u eb i
a
c la i
i
o l
s
ra an</p>
      <p>u
v u tr</p>
      <p>h s
e
t
a í
k j
š
t
o e i
p n r</p>
      <p>d
u
ˇ
c</p>
      <p>e
n se t
u
u
ˇ
c ´l
u o
n k
k íš</p>
      <p>t
o</p>
      <p>m
p ep z
n
a fa
z r
d ó
e b i
g ý
o ah
m d r
tro rak d</p>
      <p>r
vo jo íz o</p>
      <p>v
a n z
m
a á</p>
      <p>z
u a
p p
t
s
a
’
t
p š
o z r
ro p
t
’
i
ˇ
c
u t
’
t
av t’</p>
      <p>a
o ˇ
p</p>
      <p>ý
ak la ov ˇac
l
y y k</p>
      <p>b i
b o ˇc r
o k lá s
k s
n
a
a l
n
o ˇ
h
c l</p>
      <p>u la
i h h
t e e
ec ˇik la ad</p>
      <p>c te
n á
a z a
t o r
m i
l o
fi l
e k
c ˇ
d i
r š
s u
a a
c i
d d
ý
v
o lm la
d
ofi sá l
n
eu rpo ik d
l
c</p>
      <p>fi i
o a r
n i
ie o ak
n
e
r
o r o
k o b
an l’ i
g u il
e b ˇc
’
t
s
o
d
a
r
e
i
a en to ab t’
l b s s
la ’ú j
l
a
z
k á ok lá e ˇ
n ah t i rv ec</p>
      <p>a
š r l
u d z
m e
ˇ
c r</p>
      <p>o ik
ˇ
n k ˇ
á
c i
ˇ
c
k k
o h v i
b a
ú t’ o c
o s c</p>
      <p>m
g ru
o t r
d en ts
m
tý r a
i a r</p>
      <p>d i
rd rs ra c
s</p>
      <p>k
b á</p>
      <p>l
o i l</p>
      <p>m h
i
d r
s ch
’ a r
le i a it
gg irc
n ˇ
e i
ck ie an a i
b i
n il
k e r
r
t’ c n
n án ý n re rá
a ’ a
i t
c r
r
m in v</p>
      <p>v o c
d ý p o o id
r
a
zó a
n lu jk l</p>
      <p>o fi
e
ra ˇ</p>
      <p>a
n zd ak
e
ra ie v
v tí
h k
m no tee
m to o
k lm in l
fi k ev</p>
      <p>fi o
fi o r
n j
á o r
k i
’</p>
      <p>k i
a
m i ék lm lm š a a
fi k k o l</p>
      <p>d e
it k v re r
n</p>
      <p>r
p p o d</p>
      <p>p
m ié
m t</p>
      <p>e
vb eo re i
p</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          <article-title>: node2vec - Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramos</surname>
          </string-name>
          :
          <article-title>Using tf-idf to determine word relevance in document queries</article-title>
          .
          <source>In ICML</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          :
          <article-title>Understanding inverse document frequency: On theoretical arguments for IDF</article-title>
          .
          <source>Journal of Documentation</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ch</surname>
          </string-name>
          . D. Manning:
          <article-title>Glove - Global vectors for word representation</article-title>
          .
          <source>In Conference on Empirical Methods on Natural Language Processing (EMNLP)</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          :
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>In ICLR Workshop Papers 2013a</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Xexéo: Word embeddings - a survey</article-title>
          .
          <source>arXiv preprint arXiv:1901</source>
          .
          <volume>09069</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hazarika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          , E. Cambria:
          <article-title>Recent trends in deep learning based natural language processing</article-title>
          .
          <source>IEEE Computational Intelligence Magazine</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          , G. Dinu, and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Kruszewski: Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors</article-title>
          .
          <source>In Proceedings of the 52nd Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <year>June 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ch</surname>
          </string-name>
          . D. Manning:
          <article-title>Semi-supervised recursive autoencoders for predicting sentiment distributions</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11. Association for Computational Linguistics</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ch</surname>
          </string-name>
          . Miao:
          <article-title>Agenerative word embedding model and its lowrank positive semidefinite solution</article-title>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Harshman</surname>
          </string-name>
          : Indexing by latent semanticanalysis,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kurt</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Lee: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications</article-title>
          . Proceedings of the Second International Conference on Autonomous Agents.
          <source>AGENTS '98</source>
          . pp.
          <fpage>116</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L</given-names>
            <surname>'. Štúr</surname>
          </string-name>
          <article-title>Institute of Linguistics of the Slovak Academy of Sciences (SAS): Krátky slovník slovenského jazyka 4. 2003 - kodifikacˇná prírucˇka</article-title>
          . Retrieved from https://slovnik. aktuality.sk/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L</given-names>
            <surname>'. Štúr</surname>
          </string-name>
          <article-title>Institute of Linguistics of the Slovak Academy of Sciences (SAS): Synonymický slovník</article-title>
          slovencˇiny,
          <year>2004</year>
          .Retrieved from https://slovnik.aktuality.sk/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Turney</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Pantel: From Frequency to Meaning: Vector Space Models of Semantics</article-title>
          .
          <source>In Journal of Artificial Intelligence Research</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>S. Krajcˇi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Novotný</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Turlíková</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Laclavík: The tool Morphonary/Tvaroslovník: Using of word lemmatization in processing of documents in Slovak</article-title>
          , in: P.
          <string-name>
            <surname>Návrat</surname>
          </string-name>
          , D. Chudá (eds.),
          <source>Proceedings Znalosti</source>
          <year>2009</year>
          ,
          <string-name>
            <surname>Vydavatel'stvo</surname>
            <given-names>STU</given-names>
          </string-name>
          , Bratislava,
          <year>2009</year>
          , s.
          <fpage>119</fpage>
          -
          <lpage>130</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>S. Krajcˇi</surname>
          </string-name>
          , R. Novotný
          <article-title>- databáza tvarov slov slovenského jazyka, Informacˇné technológie - Aplikácie a teória, zborník príspevkov z pracovného seminára</article-title>
          ITAT,
          <volume>17</volume>
          .-
          <fpage>21</fpage>
          . september 2012,
          <article-title>Monkova dolina (Slovensko), Košice</article-title>
          ,
          <string-name>
            <surname>SAIS</surname>
          </string-name>
          , Slovenská spolocˇnost' pre umelú inteligenciu,
          <year>2012</year>
          , ISBN 9788097114411, s.
          <fpage>57</fpage>
          -
          <lpage>61</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>S. Krajcˇi</surname>
          </string-name>
          , R. Novotný:
          <article-title>Projekt Tvaroslovník - slovník všetkých tvarov všetkých slovenských slov</article-title>
          ,
          <source>Znalosti</source>
          <year>2012</year>
          ,
          <source>zborník príspevkov 11. rocˇníka konferencie: 14</source>
          . -
          <fpage>16</fpage>
          . október 2012,
          <article-title>Mikulov (Cˇ esko)</article-title>
          , Praha,
          <string-name>
            <surname>MATFYZPRESS</surname>
          </string-name>
          , Vydavatelství MFF UK v Praze,
          <year>2012</year>
          , ISBN 9788073782207, s.
          <fpage>109</fpage>
          -
          <lpage>112</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>C. McCormick: Word2Vec Tutorial - The Skip-Gram Model</surname>
          </string-name>
          . Retrieved from http://www.mccormickml.com,
          <year>2016</year>
          , April 19.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Christian: Why Is Google Translate Spitting Out Sinister Religious Prophecies</surname>
          </string-name>
          ? Retrieved from https://www. vice.com/en_us,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cohen</surname>
          </string-name>
          <article-title>: node2vec: Embeddings for Graph Data</article-title>
          . Retrieved from https://towardsdatascience.com/,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>