=Paper=
{{Paper
|id=Vol-2470/p30
|storemode=property
|title=Lithuanian news clustering using document embeddings
|pdfUrl=https://ceur-ws.org/Vol-2470/p30.pdf
|volume=Vol-2470
|authors=Lukas Stankevičius,Mantas Lukoševičius
|dblpUrl=https://dblp.org/rec/conf/ivus/StankeviciusL19
}}
==Lithuanian news clustering using document embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2470/p30.pdf</pdf>
<pre>
             Lithuanian news clustering using document
                           embeddings

                       Lukas Stankevičius                                                             Mantas Lukoševičius
                      Faculty of Informatics                                                         Faculty of Informatics
                 Kaunas University of Technology                                                 Kaunas University of Technology
                        Kaunas, Lithuania                                                              Kaunas, Lithuania
                   lukas.stankevicius@ktu.edu                                                      mantas.lukosevicius@ktu.lt


    Abstract—A lot of research of natural language processing is                 to vector (Doc2Vec) can improve clustering of Lithuanian
done and applied on English texts but relatively little is tried on              news.
less popular languages. In this article document embeddings are
compared with traditional bag of words methods for Lithuanian                         II. RELATED WORK ON LITHUANIAN LANGUAGE
news clustering. The results show that for enough documents the                      Articles on Lithuanian language documents clustering
embeddings greatly outperform simple bag of words
                                                                                 suggest using K-means [4], spherical K-means [6] or
representations. In addition, optimal lemmatization,
embeddings vector size, and number of training epochs were
                                                                                 Expectation-Maximization (EM) [7] algorithms. It was also
investigated.                                                                    observed that K-means is fast and suitable for large corpora
                                                                                 [7] and outperforms other popular algorithms [4].
   Keywords—document clustering; document                     embedding;              [6] considers Term Frequency / Inverse Document
lemmatization; Lithuanian news articles.                                         Frequency (TF-IDF) as the best weighting scheme. [4] adds
                         I. INTRODUCTION                                         that it must be used together with stemming while [6]
                                                                                 advocates to do minimum and maximum document frequency
    The knowledge and information are inseparable part of our                    filtering before applying TF-IDF. These works show that TF-
civilization. For thousands of years from news of incoming                       IDF is significant weighting scheme and it could be optionally
troops to ordinary know-how could have meant death or life.                      tried with some additional preprocessing steps.
Knowledge accumulation throughout the centuries led to
astonishing improvements of our way of live. Hardly anyone                           We have not found any research on Lithuanian language
could persist having no news or other kinds of information                       regarding document embeddings. However, there are some
even throughout the day.                                                         work on word embeddings. In [8] word embeddings using
                                                                                 different models and training algorithms were compared after
    Despite information scarcity centuries ago, nowadays we                      training on 234 million tokens corpus. It was found that
have the opposite situation. Demand and technology greatly                       Continuous Bag of Words (CBOW) architecture significantly
increased the amount of information we can acquire. Now                          outperformed skip-gram method while vector dimensionality
one’s goal is to not get lost in it. As an example, the most                     showed no significant impact on the results. This implies that
popular Lithuanian news website each day publishes                               document embeddings like word embeddings should follow
approximately 80 news articles. Add other news websites not                      same CBOW architectural pattern. Other work [9] compared
only from Lithuania but the entire world and one would end                       traditional and deep learning (with use of word embeddings)
up overwhelmed to read most of this information.                                 approaches for sentiment analysis and found that deep
    The field of text data mining emerged to tackle this kind of                 learning demonstrated good results only when applied on the
problems. It goes “beyond information access to further help                     small datasets, otherwise traditional methods were better. As
users analyze and digest information and facilitate decision                     embeddings may be underperforming in sentiment analysis it
making” [1]. Text data mining offers several solutions to better                 will be tested if it is a case for news clustering.
characterize text documents: summarization, classification                                    III. TEXT CLUSTERING PROCESS
and clustering [1]. However, when evaluated by people, the
best summarization results currently are given only 2-4 points                      To improve clustering quality some text preprocessing
out of 5 [2]. Today the best classification accuracies are 50-                   must be done. Every text analytics process consists „of three
94% [3] and clustering of about 0.4 F1 score [4]. Although                       consecutive phases: Text Preprocessing, Text Representation
achieved classification results are more accurate, the                           and Knowledge Discovery“ [1] (the last being clustering in our
clustering is perceived more promising as it is universal and                    case).
can handle unknown categories as it is the case for diverse                      A. Text preprocessing
news data.
                                                                                     The purpose of text preprocessing is to make the data more
   After it was shown that artificial neural networks can be                     concise and facilitate text representation. It mainly involves
successfully trained and used to reduce dimensionality [5],                      tokenizing text into features and dropping the ones considered
many new successful data mining models had emerged. The                          less important. Extracted features can be words, chars or any
aim of this work is to test how one of such models – document                    n-gram (contiguous sequence of n items from a given sample
© 2019 for this paper by its authors. Use permitted under Creative
                                                                                 of text) of both. Tokens can also be accompanied by the
Commons License Attribution 4.0 International (CC BY 4.0)                        structural or placement aspects of document [10].

                                                                           104
    The most and least frequent items are considered                     C. Text clustering
uninformative and dropped. Tokens found on every document                    There are tens of clustering algorithms to choose from
are not descriptive and they usually include stop words such             [14]. One of the simplest and widely used is k-means
as “and”, “to”. On the other hand, too rare words are                    algorithm. During initialization, k-means algorithm selects k
insufficient to attribute to any characteristic and due to their         means, which corresponds to k clusters. Then algorithm
resulting sparse vectors only complicate the whole process.              repeats two steps: (1) for every data point choose the nearest
   Existing text features can be further concentrated by these           mean and assign the point to the corresponding cluster; (2)
methods:                                                                 recalculate means by averaging data points assigned to the
                                                                         corresponding cluster. The algorithm terminates, when
     stemming;                                                          assignment of the data points does not change after several
     lemmatization;                                                     iterations. As the clustering depends on initially selected
                                                                         centroids, the algorithm is usually run several times to average
     number normalization;                                              over random centroid initializations.
     allowing only maximum number of features;                                                 IV. THE DATA
     maximum document frequency – ignore terms that                     A. Articles
      appear in more than specified documents;                               Article data for this research was scraped from three
     minimum document frequency – ignore terms that                     Lithuanian news websites: the national lrt.lt and commercial
      appear in less than specified documents.                           websites 15min.lt and delfi.lt. Articles URL’s were scraped
                                                                         from sitemaps in robots.txt files in websites. Total of 82793
It was shown that the use of stemming in Lithuanian news                 articles (26336 from lrt.lt, 31397 from 15min.lt and 25060
clustering greatly increased clustering performance [4].                 from delfi.lt) were retrieved spanning random release dates of
B. Text representation                                                   2017 year.
    For the computer to make any calculations with the text                 Raw dataset contains 30338937 tokens from which
data it must be represented in numerical vectors. The simplest           641697 are unique. Unique token count can be decreased to:
representation is called “Bag Of Words” (BOW) or “Vector
                                                                             641254, dropping stop words;
Space Model” (VSM) where each document has counts or
other derived weights for each vocabulary word. This structure               635257, normalizing all numbers to a single feature;
ignores linguistic text structure. Surprisingly, in [11] it was
reviewed that “unordered methods have been found on many                     441178, applying lemmas and leaving unknown
tasks to be extremely well performing, better than several of                 words;
the more advanced techniques”, because “there are only a few                 41933, applying lemmas and dropping unknown
likely ways to order any given bag of words”.                                 words;
    The most popular weight for BOW is TF-IDF. Recent                        434472, dropping stop words, normalizing numbers,
study [4] on Lithuanian news clustering have shown that TF-                   applying lemmas and leaving unknown words.
IDF weight produced the best clustering results. TF-IDF is
calculated as:                                                               Each article has on average 366 tokens and on average 247
                                                                         unique tokens. Mean token length is 6.51 characters with
                                                𝑁                        standard deviation of 3.
             𝑡𝑓𝑖𝑑𝑓(𝑤, 𝑑) = 𝑡𝑓(𝑤, 𝑑) ∙ 𝑙𝑜𝑔                 
                                              𝑑𝑓(𝑤)
                                                                             While analyzing articles and their accompanying
    where:                                                               information, it was noticed that some labelling information
                                                                         can be acquired from article URL. Both websites have
     tf(w,d) is term frequency, the number of word w                    categorical information between the domain and article id
      occurrences in a document d;                                       parts in URL. Total of 116 distinct categorical descriptions
                                                                         were received and normalized to 12 distinct categories as
     df(w) is document frequency, the number of documents
                                                                         described at [4]. Category distributions are:
      containing word w;
                                                                             Lithuania news (20162 articles);
     N is number of documents in the corpus.
    One of the newest and widely adopted document                            World news (21052 articles);
representation schemes is Doc2Vec [12]. It is an extension of                Crime (7502 articles);
the word-to-vector (Word2Vec) representation. A word in the
Word2Vec representation is regarded as a single vector of real               Business (7280 articles);
number values. The assumption of Word2Vec is that the                        Cars (1557 articles);
element values of a word are affected by those of other words
surrounding the target word. This assumption is encoded as a                 Sports (5913 articles);
neural network structure and the network weights are adjusted
by learning observed examples [13]. Doc2Vec extends                          Technologies (1919 articles);
Word2Vec from the word level to the document level and each                  Opinions (2553 articles);
document has its own vector values in the same space as that
for words [12].                                                              Entertainment (769 articles);
                                                                             Life (944 articles);


                                                                   105
     Culture (3478 articles);                                                                𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
                                                                                                                𝑇𝑃
                                                                                                                                    
                                                                                                              𝑇𝑃+𝐹𝑃
     Other (9664 articles, which do not fall into previous
                                                                                                              𝑇𝑃
      categories).                                                                              𝑟𝑒𝑐𝑎𝑙𝑙 =                           
                                                                                                            𝑇𝑃+𝐹𝑁
    It is clearly visible that category distribution is not
                                                                                                        𝑇𝑃∙𝑇𝑁−𝐹𝑃∙𝐹𝑁
uniform. The biggest categories are “Lithuanian news” and                           𝑀𝐶𝐶 =                                          
                                                                                              √(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)
“World news” taking up to 49 % of all articles.
                                                                             MCC score ranges from -1 (total disagreement) to 1
                                                                         (perfect prediction), while 0 means no better than random
B. Words                                                                 prediction. F1 score varies from 0 (the worst) to 1 (perfect).
    Lithuanian word data was scraped from two semantic
information         databases:          morfologija.lt      and                               VI. EXPERIMENTS
tekstynas.vdu.lt/~irena/morfema_search.php. The latter                       To ensure that experiments are as reproducible as possible,
website has more accurate information, including word                    each experiment was repeated 50 times and confidence
frequency while the first is very large and was observed having          interval of each resulting clustering scores calculated. In each
some mistakes. Therefore, these two databases were merged                repetition distinct number of articles were randomly (each
prioritizing words from the second one. Resulting word                   time) selected from the dataset. However, for the same number
database contained 2212726 different word forms including                of documents this repeated random pickup would be the same
72587 lemmas.                                                            (if we were to have another experiment with same number of
                                                                         documents then these 50 samplings of articles would be the
               V. CLUSTERING EVALUATION
                                                                         same). This ensures that we evaluate as much data as possible
   The main evaluation metrics can be acquired by confusion              while keeping the same subset for different experiments.
matrix, depicted in Table I. Here for true and predicted
conditions we get counts of following types:                                 All experiments were carried out using only articles from
                                                                         the 10 biggest categories. For each of them equal number of
     TP (true positives). The true condition is positive and            articles were sampled. Only variables associated with dataset
      the predicted condition is positive.                               loading, text preprocessing and representation phases were
                                                                         varied. Actual clustering was done using k-means algorithm.
     TN (true negatives). The true condition is negative and
      the predicted condition is negative.                                  In all experiments the following actions and parameters
                                                                         were used if not specified otherwise:
     FP (false positives). The true condition is negative but
      the predicted condition is positive.                                    used 1500 articles;
     FN (false negatives). The true condition is positive but                vocabulary pruned to maximum of 10000 words;
      the predicted condition is negative.
                                                                              0.95 maximum document frequency (BOW);
     If it would be a classification task, then we would know
real classes and just simply get percentage of them predicted                 0.05 minimum document frequency (BOW);
accurately. However, in the clustering process nor we know                    Distributed Bag of Words (DBOW) architecture of
actual class, nor we have a meaning of returned predicted                      Doc2Vec model used;
class. We must rely an additional information - label of our
news article category, given by the editor of the news website.               Doc2Vec method trained on same articles to be
This way we make assumption that clusters we want to                           clustered (not all corpus);
achieve are similar to categories of articles. There indeed must
                                                                              window size of 5 words (Doc2Vec models);
be a reason, some similarity between articles, why they were
put in the same category. The only drawback of our approach                   20 training epochs (Doc2Vec models);
is that having high number of documents would require many
pair calculations. Based on chosen condition, confusion matrix                200 vector size (Doc2Vec models);
elements are as following:                                                    minimum word count of 4 (Doc2Vec models);
     TP – pairs of articles have same category label and are                 all number normalized to “#NUMBER” feature;
      predicted to be in the same cluster.
                                                                              words with known lemma lemmatized;
     TN – pairs of articles belong to different categories and
      are predicted to be in different clusters.                              words in stop word list dropped from documents;
     FP – pairs of articles belong to different categories but               unigrams used (feature as a single word).
      are predicted to be in the same cluster.
                                                                         A. Number of articles and preprocessor method experiment
     FN – pairs of articles having same category label but                  In this experiment dataset size and preprocessor method
      are predicted to be in different clusters.                         were varied to determine how the two are correlated. Tried text
   We will use F1, as the one widely used, and MCC, as more              representations include BOW and Doc2vec with distributed
robust, evaluation scores:                                               bag of words variation. It was also examined how well
                                                                         Doc2Vec would perform if trained on all the 82793 articles.
                              𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∙𝑟𝑒𝑐𝑎𝑙𝑙
                    𝐹1 = 2                                         B. Reducing words to lemmas experiment
                              𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
                                                                            This experiment investigated 3 scenarios:


                                                                   106
   1) lemmas are not used;                                                Fig. 1. MCC score dependency on text representation method and number
                                                                          of documents used in clustering.
   2) words for which lemmas could be found were
      replaced with them and other words discarded;                       B. Reducing words to lemmas experiment
   3) same as 2 but unknown words remained.                                   Experiment results are depicted in Fig. 2. It was observed
                                                                          that converting known words to lemmas gives MCC score
    Another parameter, namely maximum number of features,                 boost both for BOW and Doc2Vec models. The highest
solves similar issues as lemmatization. Due to this reason                increase of MCC score (from 0.122 to 0.221 for 10000
several values of maximum number of allowed features were                 maximum features) for BOW representation is observed then
tried.                                                                    after lemmatization non-lemmatized words are dropped. On
C. Training epochs and embedding vector size experiment                   the other hand, Doc2Vec representation yields higher MCC
                                                                          score increase then non-lemmatized words are left (from 0.356
    In this experiment two parameters for Doc2Vec were                    to 0.401 for 40000 maximum number of features). It is clearly
optimized: training epochs (from 5 to 100) and vector size                visible that both vectorization methods benefit from
(from 5 to 400). Distributed bag of words version of Doc2Vec              lemmatization.
was used.
D. Clustering articles from a defined release interval
    In this experiment the best configurations for BOW and
Doc2Vec will be tried on articles released in one week from
2017-04-28 to 2017-05-04 dates, covering total of 1001
articles. Both models with same articles will be run 50 times
and the best run selected. Doc2Vec is trained on same articles
used for clustering using maximum number of 40000 features
and vector size of 52.
    The best resulting clusters will be analyzed with the same
BOW workflow as documents but reducing features only with
0.8 maximum and 0.1 minimum document frequencies. 10
words with the biggest TF-IDF weights will be selected as
representative of each cluster.
               VII. RESULTS AND ANALYSIS                                  Fig. 2. MCC score dependency on how words are changed to their lemma
                                                                          with or without constrain of maximum features.
A. Number of articles and preprocessor method experiment
    Experiment results are shown in Fig. 1. The best recorded             C. Training epochs and embedding vector size experiment
MCC score is 0.403 (0.464 for F1) for Doc2Vec, distributed                    Clustering results for several epochs and vector sizes are
bag of words variation trained on all corpus and clustering               depicted in Fig. 3. The highest average MCC score was
3000 articles. It is clearly visible that all text representation         recorder for vector size of 150 and 20 epochs at 0.381. It is
models are better with higher number of documents. When                   interesting to note that increasing number of training epochs
clustering a small number of documents we can observe that                to 100 reduces MCC to 0.316. This reduction is observer for
BOW model outperforms Doc2Vec if the latter is trained only               all vector sizes and could be explained as overfitting. On the
on documents that are later used for clustering. However,                 other hand, only 5 epochs give poor results with maximum
starting with 300 documents Doc2vec outperforms BOW                       MCC of 0.133 for vector size of 10 and it should be regarded
model. This shows that Doc2Vec model depends on how                       as underfitting. With optimal number of training epochs being
many documents it is trained on as the model trained on all               20, there are many vector sizes (from 20 to 400) yielding very
corpus has the biggest MCC score of 0.201 when clustering                 similar MCC results. This shows that small vector sizes such
100 articles. However, advantage of training on all corpus                as 20 are enough to train 1500 articles dataset for 20 epochs
instead of only documents to be clustered quickly diminishes              for good text representation.
as the number of clustering documents approaches 700.


                                                                    107
Fig. 3. MCC score dependency on vector size and number of training
epochs in Doc2Vec distributed bag of words representation clustering

D. Clustering articles from defined release interval
   The best Doc2Vec model trained on a small corpus
outperformed the best BOW model (MCC 0.318 and 0.145,
F1 0.415 and 0.282). Cluster features and statistics of
Doc2vec model are depicted in Table I. It shows that model
performs reasonably well and can distinguish:
     very small (1.9 % of all articles) distinct weather
      forecast category (cluster Nr. 5);
     classical categories as culture, sports, and crime
      (clusters Nr. 3, 8 and 10);
     hot topics as university reform, Brexit and current
      political scandals (clusters Nr. 1, 4 and 8).


                                                                       108
                                                                                                             TABLE I.                             CLUSTERS STATISTICS

                                                                    Category label


                                                   Lithuania news


                                                                                                              Entertainment
                                                                      Technologies


                                                                                                World news
  Cluster Nr.

                   Number of
                   articles in


                                                                                     Opinions


                                                                                                                                       Business
                                         Culture
                    cluster


                                                                                                                              Sports
                                 Crime
                     Other
                                                                                                                                                                Most descriptive features and their translation to English


                                                                                                                                                    universitetas, mokslas, eur, mokykla, studija, pertvarka, akademija,
  1.               40     11     0       0         24                 0              3             0          0               0        2                 rektorius, vu, kokybė // university, science, eur, school, study,
                                                                                                                                                      transformation, academy, rector, vu (Vilnius University), quality
                                                                                                                                                     muzika, alkoholis, kultūra, ntv, filmas, visuomenė, maistas, namas,
  2.               87     27     0       2         35                 3              15            3          0               0        2                liga, lelkaitis // music, alcohol, culture, ntv, film, society, food,
                                                                                                                                                                   house, illness, lelkaitis (surname of a person)
                                                                                                                                                          koncertas, teatras, muzika, rež, biblioteka, festivalis, džiazas,
  3.              118     29     1       40        18                 4              1             4         16               2        3                kultūra, paroda, muziejus // concert, theater, music, dir, library,
                                                                                                                                                                     festival, jazz, culture, exhibition, museum
                                                                                                                                                      es, brexit, derybos, le, pen, may, macronas, partija, th, politinis //
  4.              106      8     0       0         16                 0              1          80            0               0        1
                                                                                                                                                            es, brexit, talks, le, pen, may, macron, party, th, political
                                                                                                                                                    laipsnis, šiluma, temperatūra, naktis, debesis, debesuotumas, lietus,
  5.               19      0     0       0         16                 0              0             2          0               0        1                įdienojus, pūs, termometrai // degree, heat, temperature, night,
                                                                                                                                                        cloud, clouds, rain, be broad daylight, will blow, thermometers
                                                                                                                                                    jav, korėtis, raketa, korėja, branduolinis, putinas, jungtinis, pajėgos,
  6.              184      1     0       0         16                 5              0          160           0               0        2              karinis, sirijos // usa, korėtis, rocket, korea, nuclear, putin, united,
                                                                                                                                                                                forces, military, syrian
                                                                                                                                                       įmonė, seimas, įstatymas, mokestis, savivaldybė, kaina, šiluma,
  7.              120     11     1       0         37                 4              9          10            0               0        48                  asmuo, projektas, pajamos // company, parlament, law, tax,
                                                                                                                                                                 municipality, price, heat, person, project, income
                                                                                                                                                     seimas, pūkas, partija, teismas, komisija, konstitucija, pirmininkas,
                                                                                                                                                    įstatymas, apkalti, taryba // parlament, pūkas (surname of a person),
  8.               79      4     1       1         67                 0              1             0          0               2        3
                                                                                                                                                              party, court, commission, constitution, chairman, law,
                                                                                                                                                                                 impeachment, board
                                                                                                                                                       rungtynės, taškas, žaidėjas, čempionatas, ekipa, rinktinė, įvartis,
  9.               64      0     0       0         0                  0              0             0          0               64       0                 pelnyti, pergalė, raptors // match, point, player, championship,
                                                                                                                                                       team, team, goal, win, victory, raptors (name of basketball club)
                                                                                                                                                           policija, automobilis, vyras, vairuotojas, pranešti, įtariamas,
 10.              184     13     67      2         27                 3              0          68            0               0        4               sulaikyti, žūti, teismas, asmuo // police, car, man, driver, report,
                                                                                                                                                                       suspected, detained, die, court, person


                                     VIII. CONCLUSIONS                                                                                                  [6]  Mackutė-varoneckienė, Aušra; Krilavičius, Tomas. Empirical study on
                                                                                                                                                             unsupervised feature selection for document clustering. In Human
    In this work BOW and Doc2Vec text representation                                                                                                         Language Technologies – The Baltic Perspective 2014. p. 107-110.
methods were compared. Our research shows that Doc2Vec                                                                                                  [7] Ciganaitė, Greta, Aušra Mackutė-Varoneckienė, and Tomas
greatly outperforms BOW model. Clustering weeks’ worth of                                                                                                    Krilavičius. Text documents clustering. Informacinės technologijos.
                                                                                                                                                             XIX tarpuniversitetinė magistrantų ir doktorantų konferencija"
data the highest MCC scores are 0.318 versus 0.145. However,                                                                                                 Informacinė visuomenė ir universitetinės studijos"(IVUS 2014):
for Doc2Vec method to outperform BOW when clustering less                                                                                                    konferencijos pranešimų medžiaga, 2014, p. 90-93. 2014.
than 300 articles, it must be trained on a much larger dataset.                                                                                         [8] Kapočiūtė-Dzikienė, Jurgita, and Robertas Damaševičius. Intrinsic
We estimated optimal embedding vector size large enough                                                                                                      evaluation of Lithuanian word embeddings using WordNet. Computer
starting with 20 and optimal number of training epochs around                                                                                                Science On-line Conference. Springer, Cham, 2018.
20. Analysis of words conversion to their lemmas showed that                                                                                            [9] Kapočiūtė-Dzikienė, Jurgita, Robertas Damaševičius, and Marcin
lemmatization of words is beneficial for both BOW and                                                                                                        Woźniak. Sentiment analysis of Lithuanian texts using traditional and
Doc2Vec representations.                                                                                                                                     deep learning approaches. Computers 8.1 (2019): 4.
                                                                                                                                                        [10] Aker A, Paramita M, Kurtic E, Funk A, Barker E, Hepple M,
                                            REFERENCES                                                                                                       Gaizauskas R. Automatic label generation for news comment clusters.
                                                                                                                                                             In Proceedings of the 9th International Natural Language Generation
[1]             Aggarwal CC, Zhai C, editors. Mining text data. Springer Science &                                                                           Conference 2016 (pp. 61-69).
                Business Media; 2012 Feb 3.
                                                                                                                                                        [11] White L, Togneri R, Liu W, Bennamoun M. Sentence Representations
[2]             Liu L, Lu Y, Yang M, Qu Q, Zhu J, Li H. Generative adversarial                                                                               and Beyond. In Neural Representations of Natural Language 2019 (pp.
                network for abstractive text summarization. In Thirty-Second AAAI                                                                            93-114). Springer, Singapore.
                Conference on Artificial Intelligence 2018 Apr 29.
                                                                                                                                                        [12] LE, Quoc; MIKOLOV, Tomas. Distributed representations of
[3]             Liu G, Guo J. Bidirectional LSTM with attention mechanism and                                                                                sentences and documents. In: International conference on machine
                convolutional layer for text classification. Neurocomputing. 2019 Feb                                                                        learning. 2014. p. 1188-1196.
                1.
                                                                                                                                                        [13] MIKOLOV, Tomas, et al. Efficient estimation of word representations
[4]             V. Pranckaitis and M. Lukoševičius, Clustering of Lithuanian news                                                                            in vector space. arXiv preprint arXiv:1301.3781, 2013.
                articles. Proceedings of the IVUS 2017, pp. 27-32.
                                                                                                                                                        [14] Charu C. Aggarwal , Chandan K. Reddy, Data Clustering: Algorithms
[5]             Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data                                                                             and Applications, Chapman & Hall/CRC
                with neural networks. science. 2006 Jul 28;313(5786):504-7.


                                                                                                                                                  109

</pre>