<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lithuanian news clustering using document embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukas Stankevičius</string-name>
          <email>lukas.stankevicius@ktu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mantas Lukoševičius</string-name>
          <email>mantas.lukosevicius@ktu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, Kaunas University of Technology</institution>
          ,
          <addr-line>Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>104</fpage>
      <lpage>109</lpage>
      <abstract>
        <p>-A lot of research of natural language processing is done and applied on English texts but relatively little is tried on less popular languages. In this article document embeddings are compared with traditional bag of words methods for Lithuanian news clustering. The results show that for enough documents the embeddings greatly outperform simple bag of words representations. In addition, optimal lemmatization, embeddings vector size, and number of training epochs were investigated.</p>
      </abstract>
      <kwd-group>
        <kwd>document clustering</kwd>
        <kwd>document embedding</kwd>
        <kwd>lemmatization</kwd>
        <kwd>Lithuanian news articles</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>The knowledge and information are inseparable part of our
civilization. For thousands of years from news of incoming
troops to ordinary know-how could have meant death or life.
Knowledge accumulation throughout the centuries led to
astonishing improvements of our way of live. Hardly anyone
could persist having no news or other kinds of information
even throughout the day.</p>
      <p>Despite information scarcity centuries ago, nowadays we
have the opposite situation. Demand and technology greatly
increased the amount of information we can acquire. Now
one’s goal is to not get lost in it. As an example, the most
popular Lithuanian news website each day publishes
approximately 80 news articles. Add other news websites not
only from Lithuania but the entire world and one would end
up overwhelmed to read most of this information.</p>
      <p>
        The field of text data mining emerged to tackle this kind of
problems. It goes “beyond information access to further help
users analyze and digest information and facilitate decision
making” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Text data mining offers several solutions to better
characterize text documents: summarization, classification
and clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, when evaluated by people, the
best summarization results currently are given only 2-4 points
out of 5 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Today the best classification accuracies are
5094% [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and clustering of about 0.4 F1 score [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Although
achieved classification results are more accurate, the
clustering is perceived more promising as it is universal and
can handle unknown categories as it is the case for diverse
news data.
      </p>
      <p>
        After it was shown that artificial neural networks can be
successfully trained and used to reduce dimensionality [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
many new successful data mining models had emerged. The
aim of this work is to test how one of such models – document
© 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)
to vector (Doc2Vec) can improve clustering of Lithuanian
news.
      </p>
      <p>II. RELATED WORK ON LITHUANIAN LANGUAGE</p>
      <p>
        Articles on Lithuanian language documents clustering
suggest using K-means [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], spherical K-means [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or
Expectation-Maximization (EM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] algorithms. It was also
observed that K-means is fast and suitable for large corpora
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and outperforms other popular algorithms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] considers Term Frequency / Inverse Document
Frequency (TF-IDF) as the best weighting scheme. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] adds
that it must be used together with stemming while [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
advocates to do minimum and maximum document frequency
filtering before applying TF-IDF. These works show that
TFIDF is significant weighting scheme and it could be optionally
tried with some additional preprocessing steps.
      </p>
      <p>
        We have not found any research on Lithuanian language
regarding document embeddings. However, there are some
work on word embeddings. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] word embeddings using
different models and training algorithms were compared after
training on 234 million tokens corpus. It was found that
Continuous Bag of Words (CBOW) architecture significantly
outperformed skip-gram method while vector dimensionality
showed no significant impact on the results. This implies that
document embeddings like word embeddings should follow
same CBOW architectural pattern. Other work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] compared
traditional and deep learning (with use of word embeddings)
approaches for sentiment analysis and found that deep
learning demonstrated good results only when applied on the
small datasets, otherwise traditional methods were better. As
embeddings may be underperforming in sentiment analysis it
will be tested if it is a case for news clustering.
      </p>
    </sec>
    <sec id="sec-2">
      <title>III. TEXT CLUSTERING PROCESS</title>
      <p>
        To improve clustering quality some text preprocessing
must be done. Every text analytics process consists „of three
consecutive phases: Text Preprocessing, Text Representation
and Knowledge Discovery“ [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (the last being clustering in our
case).
      </p>
      <sec id="sec-2-1">
        <title>A. Text preprocessing</title>
        <p>
          The purpose of text preprocessing is to make the data more
concise and facilitate text representation. It mainly involves
tokenizing text into features and dropping the ones considered
less important. Extracted features can be words, chars or any
n-gram (contiguous sequence of n items from a given sample
of text) of both. Tokens can also be accompanied by the
structural or placement aspects of document [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>most and least frequent items are considered
uninformative and dropped. Tokens found on every document
are not descriptive and they usually include stop words such
as “and”, “to”. On the other hand, too rare
words are
insufficient to attribute to any characteristic and due to their
resulting sparse vectors only complicate the whole process.</p>
        <p>Existing text features can be further concentrated by these
methods:
 stemming;
 lemmatization;
 number normalization;
 allowing only maximum number of features;


maximum document frequency – ignore terms that
appear in more than specified documents;
minimum document frequency – ignore terms that
appear in less than specified documents.</p>
        <p>
          It was shown that the use of stemming in Lithuanian news
clustering greatly increased clustering performance [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Text representation</title>
        <p>
          For the computer to make any calculations with the text
data it must be represented in numerical vectors. The simplest
representation is called “Bag Of Words” (BOW) or “Vector
Space Model” (VSM) where each document has counts or
other derived weights for each vocabulary word. This structure
ignores linguistic text structure. Surprisingly, in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] it was
reviewed that “unordered methods have been found on many
tasks to be extremely well performing, better than several of
the more advanced techniques”, because “there are only a few
likely ways to order any given bag of words”.
        </p>
        <p>
          The most popular weight for BOW is TF-IDF. Recent
study [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] on Lithuanian news clustering have shown that
TFIDF weight produced the best clustering results. TF-IDF is
calculated as:

where:
 tf(w,d) is term frequency, the number of word w
occurrences in a document d;
containing word w;
 df(w) is document frequency, the number of documents

        </p>
        <p>N is number of documents in the corpus.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>One of the newest and</title>
      <p>
        widely adopted
document
representation schemes is Doc2Vec [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It is an extension of
the word-to-vector (Word2Vec) representation. A word in the
Word2Vec representation is regarded as a single vector of real
number values. The assumption of Word2Vec is that the
element values of a word are affected by those of other words
surrounding the target word. This assumption is encoded as a
neural network structure and the network weights are adjusted
by learning
observed
examples [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Doc2Vec
extends
Word2Vec from the word level to the document level and each
document has its own vector values in the same space as that
for words [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>C. Text clustering</title>
        <p>
          There are tens of clustering algorithms to choose from
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. One of the simplest and
widely used is k-means
algorithm. During initialization, k-means algorithm selects k
means, which corresponds to k clusters. Then algorithm
repeats two steps: (1) for every data point choose the nearest
mean and assign the point to the corresponding cluster; (2)
recalculate means by averaging data points assigned to the
corresponding
cluster.
        </p>
        <p>The
algorithm
terminates,
when
assignment of the data points does not change after several
iterations. As the clustering depends on initially selected
centroids, the algorithm is usually run several times to average
over random centroid initializations.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. THE DATA</title>
      <sec id="sec-4-1">
        <title>A. Articles</title>
        <p>Article data for this research was scraped from three
Lithuanian news websites: the national lrt.lt and commercial
websites 15min.lt and delfi.lt. Articles URL’s were scraped
from sitemaps in robots.txt files in websites. Total of 82793
articles (26336 from lrt.lt, 31397 from 15min.lt and 25060
from delfi.lt) were retrieved spanning random release dates of
2017 year.</p>
        <p>Raw
dataset contains 30338937 tokens from
which
641697 are unique. Unique token count can be decreased to:
 641254, dropping stop words;
 635257, normalizing all numbers to a single feature;
 441178, applying lemmas and leaving unknown
 41933, applying lemmas and dropping unknown
words;
words;
 434472, dropping stop words, normalizing numbers,
applying lemmas and leaving unknown words.</p>
        <p>Each article has on average 366 tokens and on average 247
unique tokens. Mean token length is 6.51 characters with
standard deviation of 3.</p>
        <p>
          While
analyzing
articles
and
their
accompanying
information, it was noticed that some labelling information
can be acquired from
article URL. Both
websites have
categorical information between the domain and article id
parts in URL. Total of 116 distinct categorical descriptions
were received and normalized to 12 distinct categories as
described at [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Category distributions are:






Other (9664 articles, which do not fall into previous
categories).
        </p>
        <p>It is clearly visible that category distribution is not
uniform. The biggest categories are “Lithuanian news” and
“World news” taking up to 49 % of all articles.









</p>
      </sec>
      <sec id="sec-4-2">
        <title>B. Words</title>
        <p>information</p>
        <p>Lithuanian word data was scraped from two semantic
databases:
morfologija.lt
tekstynas.vdu.lt/~irena/morfema_search.php.</p>
        <p>The
website
has</p>
        <p>more accurate information, including
frequency while the first is very large and was observed having
some mistakes. Therefore, these two databases were merged
prioritizing
words from the second one. Resulting
word
database contained 2212726 different word forms including
72587 lemmas.</p>
        <p>and
latter
word</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>V. CLUSTERING EVALUATION</title>
      <p>The main evaluation metrics can be acquired by confusion
matrix, depicted in Table I. Here for true and predicted
conditions we get counts of following types:</p>
      <p>TP (true positives). The true condition is positive and
the predicted condition is positive.</p>
      <p>TN (true negatives). The true condition is negative and
the predicted condition is negative.</p>
      <p>FP (false positives). The true condition is negative but
the predicted condition is positive.</p>
      <p>FN (false negatives). The true condition is positive but
the predicted condition is negative.</p>
      <p>If it would be a classification task, then we would know
real classes and just simply get percentage of them predicted
accurately. However, in the clustering process nor we know
actual class, nor we have a meaning of returned predicted
class. We must rely an additional information - label of our
news article category, given by the editor of the news website.
This way we</p>
      <p>make assumption that clusters we want to
achieve are similar to categories of articles. There indeed must
be a reason, some similarity between articles, why they were
put in the same category. The only drawback of our approach
is that having high number of documents would require many
pair calculations. Based on chosen condition, confusion matrix
elements are as following:</p>
      <p>TP – pairs of articles have same category label and are
predicted to be in the same cluster.</p>
      <p>TN – pairs of articles belong to different categories and
are predicted to be in different clusters.</p>
      <p>FP – pairs of articles belong to different categories but
are predicted to be in the same cluster.</p>
      <p>FN – pairs of articles having same category label but
are predicted to be in different clusters.</p>
      <p>We will use F1, as the one widely used, and MCC, as more
robust, evaluation scores:

 1 = 2
+ 
</p>
      <p>MCC score ranges from -1 (total disagreement) to 1
(perfect prediction), while 0 means no better than random
prediction. F1 score varies from 0 (the worst) to 1 (perfect).</p>
    </sec>
    <sec id="sec-6">
      <title>VI. EXPERIMENTS</title>
      <p>To ensure that experiments are as reproducible as possible,
each experiment was repeated 50 times and confidence
interval of each resulting clustering scores calculated. In each
repetition distinct number of articles were randomly (each
time) selected from the dataset. However, for the same number
of documents this repeated random pickup would be the same
(if we were to have another experiment with same number of
documents then these 50 samplings of articles would be the
same). This ensures that we evaluate as much data as possible
while keeping the same subset for different experiments.</p>
      <p>All experiments were carried out using only articles from
the 10 biggest categories. For each of them equal number of
articles were sampled. Only variables associated with dataset
loading, text preprocessing and representation phases were
varied. Actual clustering was done using k-means algorithm.</p>
      <p>In all experiments the following actions and parameters
were used if not specified otherwise:
 used 1500 articles;
 vocabulary pruned to maximum of 10000 words;
 0.95 maximum document frequency (BOW);
 0.05 minimum document frequency (BOW);</p>
      <p>Distributed Bag of Words (DBOW) architecture of</p>
    </sec>
    <sec id="sec-7">
      <title>Doc2Vec model used;</title>
      <p>Doc2Vec method trained on same articles to be
clustered (not all corpus);
window size of 5 words (Doc2Vec models);
 20 training epochs (Doc2Vec models);
 200 vector size (Doc2Vec models);</p>
      <p>minimum word count of 4 (Doc2Vec models);
 all number normalized to “#NUMBER” feature;
words with known lemma lemmatized;
words in stop word list dropped from documents;
 unigrams used (feature as a single word).





</p>
      <sec id="sec-7-1">
        <title>A. Number of articles and preprocessor method experiment</title>
        <p>In this experiment dataset size and preprocessor method
were varied to determine how the two are correlated. Tried text
representations include BOW and Doc2vec with distributed
bag of words variation. It was also examined how
well</p>
        <p>Doc2Vec would perform if trained on all the 82793 articles.
</p>
      </sec>
      <sec id="sec-7-2">
        <title>B. Reducing words to lemmas experiment</title>
        <p>This experiment investigated 3 scenarios:</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>1) lemmas are not used;</title>
      <p>2)
words for which lemmas could be found were
replaced with them and other words discarded;
3) same as 2 but unknown words remained.</p>
      <p>Another parameter, namely maximum number of features,
solves similar issues as lemmatization. Due to this reason
several values of maximum number of allowed features were
tried.</p>
      <sec id="sec-8-1">
        <title>C. Training epochs and embedding vector size experiment</title>
        <p>In this experiment two parameters for Doc2Vec were
optimized: training epochs (from 5 to 100) and vector size
(from 5 to 400). Distributed bag of words version of Doc2Vec
was used.</p>
      </sec>
      <sec id="sec-8-2">
        <title>D. Clustering articles from a defined release interval</title>
        <p>In this experiment the best configurations for BOW and
Doc2Vec will be tried on articles released in one week from
2017-04-28 to 2017-05-04 dates, covering total of 1001
articles. Both models with same articles will be run 50 times
and the best run selected. Doc2Vec is trained on same articles
used for clustering using maximum number of 40000 features
and vector size of 52.</p>
        <p>The best resulting clusters will be analyzed with the same
BOW workflow as documents but reducing features only with
0.8 maximum and 0.1 minimum document frequencies. 10
words with the biggest TF-IDF weights will be selected as
representative of each cluster.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>VII. RESULTS AND ANALYSIS</title>
      <sec id="sec-9-1">
        <title>A. Number of articles and preprocessor method experiment</title>
        <p>Experiment results are shown in Fig. 1. The best recorded
MCC score is 0.403 (0.464 for F1) for Doc2Vec, distributed
bag of words variation trained on all corpus and clustering
3000 articles. It is clearly visible that all text representation
models are better with higher number of documents. When
clustering a small number of documents we can observe that
BOW model outperforms Doc2Vec if the latter is trained only
on documents that are later used for clustering. However,
starting with 300 documents Doc2vec outperforms BOW
model. This shows that Doc2Vec model depends on how
many documents it is trained on as the model trained on all
corpus has the biggest MCC score of 0.201 when clustering
100 articles. However, advantage of training on all corpus
instead of only documents to be clustered quickly diminishes
as the number of clustering documents approaches 700.</p>
        <p>Experiment results are depicted in Fig. 2. It was observed
that converting known words to lemmas gives MCC score
boost both for BOW and Doc2Vec models. The highest
increase of MCC score (from 0.122 to 0.221 for 10000
maximum features) for BOW representation is observed then
after lemmatization non-lemmatized words are dropped. On
the other hand, Doc2Vec representation yields higher MCC
score increase then non-lemmatized words are left (from 0.356
to 0.401 for 40000 maximum number of features). It is clearly
visible that both vectorization methods benefit from
lemmatization.</p>
      </sec>
      <sec id="sec-9-2">
        <title>C. Training epochs and embedding vector size experiment</title>
        <p>Clustering results for several epochs and vector sizes are
depicted in Fig. 3. The highest average MCC score was
recorder for vector size of 150 and 20 epochs at 0.381. It is
interesting to note that increasing number of training epochs
to 100 reduces MCC to 0.316. This reduction is observer for
all vector sizes and could be explained as overfitting. On the
other hand, only 5 epochs give poor results with maximum
MCC of 0.133 for vector size of 10 and it should be regarded
as underfitting. With optimal number of training epochs being
20, there are many vector sizes (from 20 to 400) yielding very
similar MCC results. This shows that small vector sizes such
as 20 are enough to train 1500 articles dataset for 20 epochs
for good text representation.</p>
      </sec>
      <sec id="sec-9-3">
        <title>D. Clustering articles from defined release interval</title>
        <p>The best Doc2Vec model trained on a small corpus
outperformed the best BOW model (MCC 0.318 and 0.145,
F1 0.415 and 0.282). Cluster features and statistics of
Doc2vec model are depicted in Table I. It shows that model
performs reasonably well and can distinguish:



very small (1.9 % of all articles) distinct weather
forecast category (cluster Nr. 5);
classical categories as culture, sports, and crime
(clusters Nr. 3, 8 and 10);
hot topics as university reform, Brexit and current
political scandals (clusters Nr. 1, 4 and 8).
s
w
e
n
d
l
r
o
W
0
3
4
80
2
160
10
0
0
68
.
r
N
r
e
t
s
u
l
C
frebo lisen tre re
uNm tirca lscu thO
e
m
i
r
C
e
r
u
t
l
u</p>
        <p>C</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>VIII. CONCLUSIONS</title>
      <p>In this work BOW and Doc2Vec text representation
methods were compared. Our research shows that Doc2Vec
greatly outperforms BOW model. Clustering weeks’ worth of
data the highest MCC scores are 0.318 versus 0.145. However,
for Doc2Vec method to outperform BOW when clustering less
than 300 articles, it must be trained on a much larger dataset.
We estimated optimal embedding vector size large enough
starting with 20 and optimal number of training epochs around
20. Analysis of words conversion to their lemmas showed that
lemmatization of words is beneficial for both BOW and
Doc2Vec representations.
16
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
2
64
0
2
2
3
1
1
2
3
0
4
48</p>
      <p>Most descriptive features and their translation to English
universitetas, mokslas, eur, mokykla, studija, pertvarka, akademija,
rektorius, vu, kokybė // university, science, eur, school, study,
transformation, academy, rector, vu (Vilnius University), quality
muzika, alkoholis, kultūra, ntv, filmas, visuomenė, maistas, namas,
liga, lelkaitis // music, alcohol, culture, ntv, film, society, food,
house, illness, lelkaitis (surname of a person)
koncertas, teatras, muzika, rež, biblioteka, festivalis, džiazas,
kultūra, paroda, muziejus // concert, theater, music, dir, library,
festival, jazz, culture, exhibition, museum
es, brexit, derybos, le, pen, may, macronas, partija, th, politinis //
es, brexit, talks, le, pen, may, macron, party, th, political
laipsnis, šiluma, temperatūra, naktis, debesis, debesuotumas, lietus,
įdienojus, pūs, termometrai // degree, heat, temperature, night,
cloud, clouds, rain, be broad daylight, will blow, thermometers
jav, korėtis, raketa, korėja, branduolinis, putinas, jungtinis, pajėgos,
karinis, sirijos // usa, korėtis, rocket, korea, nuclear, putin, united,
forces, military, syrian
įmonė, seimas, įstatymas, mokestis, savivaldybė, kaina, šiluma,
asmuo, projektas, pajamos // company, parlament, law, tax,
municipality, price, heat, person, project, income
seimas, pūkas, partija, teismas, komisija, konstitucija, pirmininkas,
įstatymas, apkalti, taryba // parlament, pūkas (surname of a person),
party, court, commission, constitution, chairman, law,</p>
      <p>impeachment, board
rungtynės, taškas, žaidėjas, čempionatas, ekipa, rinktinė, įvartis,
pelnyti, pergalė, raptors // match, point, player, championship,
team, team, goal, win, victory, raptors (name of basketball club)
policija, automobilis, vyras, vairuotojas, pranešti, įtariamas,
sulaikyti, žūti, teismas, asmuo // police, car, man, driver, report,
suspected, detained, die, court, person</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aggarwal</surname>
            <given-names>CC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            <given-names>C</given-names>
          </string-name>
          , editors.
          <source>Mining text data. Springer Science &amp; Business Media; 2012 Feb 3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Liu</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Generative adversarial network for abstractive text summarization</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence 2018 Apr</source>
          <volume>29</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Liu</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Bidirectional LSTM with attention mechanism and convolutional layer for text classification</article-title>
          .
          <source>Neurocomputing. 2019 Feb</source>
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pranckaitis</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lukoševičius</surname>
          </string-name>
          ,
          <article-title>Clustering of Lithuanian news articles</article-title>
          .
          <source>Proceedings of the IVUS</source>
          <year>2017</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hinton</surname>
            <given-names>GE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            <given-names>RR</given-names>
          </string-name>
          .
          <article-title>Reducing the dimensionality of data with neural networks</article-title>
          .
          <source>science. 2006 Jul</source>
          <volume>28</volume>
          ;
          <volume>313</volume>
          (
          <issue>5786</issue>
          ):
          <fpage>504</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] Mackutė-varoneckienė, Aušra; Krilavičius, Tomas. Empirical study on unsupervised feature selection for document clustering</article-title>
          .
          <source>In Human Language Technologies - The Baltic Perspective</source>
          <year>2014</year>
          . p.
          <fpage>107</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ciganaitė</surname>
            , Greta, Aušra Mackutė-Varoneckienė, and
            <given-names>Tomas</given-names>
          </string-name>
          <string-name>
            <surname>Krilavičius</surname>
          </string-name>
          .
          <article-title>Text documents clustering. Informacinės technologijos. XIX tarpuniversitetinė magistrantų ir doktorantų konferencija" Informacinė visuomenė ir universitetinės studijos"</article-title>
          <source>(IVUS</source>
          <year>2014</year>
          )
          <article-title>: konferencijos pranešimų medžiaga</article-title>
          ,
          <year>2014</year>
          , p.
          <fpage>90</fpage>
          -
          <lpage>93</lpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kapočiūtė-Dzikienė</surname>
            , Jurgita, and
            <given-names>Robertas</given-names>
          </string-name>
          <string-name>
            <surname>Damaševičius</surname>
          </string-name>
          .
          <article-title>Intrinsic evaluation of Lithuanian word embeddings using WordNet</article-title>
          .
          <source>Computer Science On-line Conference</source>
          . Springer, Cham,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kapočiūtė-Dzikienė</surname>
            , Jurgita,
            <given-names>Robertas</given-names>
          </string-name>
          <string-name>
            <surname>Damaševičius</surname>
            , and
            <given-names>Marcin</given-names>
          </string-name>
          <string-name>
            <surname>Woźniak</surname>
          </string-name>
          .
          <article-title>Sentiment analysis of Lithuanian texts using traditional and deep learning approaches</article-title>
          .
          <source>Computers 8</source>
          .1 (
          <year>2019</year>
          ):
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Aker</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paramita</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurtic</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funk</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hepple</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>Automatic label generation for news comment clusters</article-title>
          .
          <source>In Proceedings of the 9th International Natural Language Generation Conference</source>
          <year>2016</year>
          (pp.
          <fpage>61</fpage>
          -
          <lpage>69</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>White</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Togneri</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennamoun</surname>
            <given-names>M</given-names>
          </string-name>
          .
          <article-title>Sentence Representations and Beyond</article-title>
          .
          <source>In Neural Representations of Natural Language</source>
          <year>2019</year>
          (pp.
          <fpage>93</fpage>
          -
          <lpage>114</lpage>
          ). Springer, Singapore.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>LE</surname>
          </string-name>
          ,
          <article-title>Quoc; MIKOLOV, Tomas. Distributed representations of sentences and documents</article-title>
          .
          <source>In: International conference on machine learning</source>
          .
          <year>2014</year>
          . p.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>MIKOLOV</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tomas</surname>
          </string-name>
          , et al.
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Charu</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chandan K. Reddy</surname>
          </string-name>
          , Data Clustering: Algorithms and Applications, Chapman &amp; Hall/CRC
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>