=Paper=
{{Paper
|id=Vol-2350/paper11
|storemode=property
|title=Improving Topic Modeling for Textual Content with Knowledge Graph Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2350/paper11.pdf
|volume=Vol-2350
|authors=Marco Brambilla,Birant Altinel
|dblpUrl=https://dblp.org/rec/conf/aaaiss/BrambillaA19
}}
==Improving Topic Modeling for Textual Content with Knowledge Graph Embeddings==
Improving Topic Modeling for Textual Content
with Knowledge Graph Embeddings
Marco Brambilla, Birant Altinel
Politecnico di Milano, DEIB
Piazza Leonardo da Vinci, 32. I-20133 Milano, Italy
{firstname.lastname}@polimi.it
Abstract al. 2013b), and it also is being used in the Topic Modeling
field (Hinton and Salakhutdinov 2009)(Srivastava, Salakhut-
Topic modeling techniques has been applied in many scenar-
ios in recent years, spanning textual content, as well as many dinov, and Hinton 2013)(Cao et al. 2015)(Nguyen et al.
different data sources. The existing researches in this field 2015)(Yao et al. 2017). One of these papers with a method
continuously try to improve the accuracy and coherence of called KGE-LDA (Yao et al. 2017) aims to improve the per-
the results. Some recent works propose new methods that cap- formance of topic modeling by obtaining the vector repre-
ture the semantic relations between words into the topic mod- sentations of words from external knowledge bases such as
eling process, by employing vector embeddings over knowl- WordNet (Miller 1995) and FreeBase (Bollacker et al. 2008)
edge bases. In this paper we study various dimensions of how instead of learning them from documents. According to their
knowledge graph embeddings affect topic modeling perfor- reported results, this approach is successful and improves
mance on textual content. In particular, the objective of the the topic coherence by 9.5% to 44% and document classi-
work is to determine which aspects of knowledge graph em-
fication accuracy by 1.6% to 5.4% compared to LDA (Blei,
bedding have a significant and positive impact on the accu-
racy of the extracted topics. In order to obtain a good under- Ng, and Jordan 2003).
standing of the impact, all steps of the process are examined Their approach improves the results with one specific
and various parameterization of the techniques are explored. method to obtain the word representations, but it’s not clear
Based on the findings, we improve the state of the art with the whether vectors obtained through other methods that can
use of more advanced embedding approaches and parameter- capture better semantics of networks are able to boost the
izations that produce higher quality topics. The work also in- accuracy of topic modeling. The vector embedding methods
clude a set of experiments with 2 variations of the knowledge that have proven to be more successful in other fields such
base, 7 embedding methods, and 2 methods for incorporation as Link Prediction can possibly capture the semantics of the
of the embeddings into the topic modeling framework, also
considering a set of variations of topic number and embed-
external knowledge base more accurately.
ding dimensionality. Another question that remains to be answered in this con-
text is whether a larger knowledge base in terms of entities
or a denser knowledge base in terms of relations between en-
Introduction tities can also contribute to better representations of words.
In the current age of information, larger and larger amounts The primary motive to this question lies in the fact that the
of data are generated and collected every second around the knowledge graphs do not have the complete semantic repre-
world. A significant portion of this data is in the form of tex- sentation of the real world, and can be improved with differ-
tual content.The need for understanding this vast amount of ent relations between entities.
textual content keeps increasing as everything in the world This paper presents two approaches to improve Topic
becomes more data-driven but mostly because of the fact Modeling. The first approach applies various Multi-
that it’s impossible for us to do it manually. relational Network Embedding Methods by computing
The fields of Natural Language Processing and Machine the vectors on the same network, and incorporating the re-
Learning offer automated to understand large amounts of sults into the topic modeling framework that has been taken
textual data. Vector representations of words (Mikolov et al. as the base method of this work. The mentioned embed-
2013) (Řehůřek and Sojka 2010) (Pennington, Socher, and ding methods all follow a translation-based approach to vec-
Manning 2014) (Joulin et al. 2016) have been used for many tors with incremental improvements over the original work
Natural Language Processing tasks such as syntactic pars- which is TransE (Bordes et al. 2013). Since knowledge em-
ing (Socher et al. 2013a) and sentiment analysis (Socher et beddings are increasingly used for topic modeling, there is
Copyright held by the author(s). In A. Martin, K. Hinkelmann, A. lack of a comprehensive study that discovers the effects of
Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of knowledge encoded by various methods. Therefore, the pri-
the AAAI 2019 Spring Symposium on Combining Machine Learn- mary motive of this work is to push the state of the art in this
ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford field forward by the application of more advanced methods
University, Palo Alto, California, USA, March 25-27, 2019. and knowledge bases for obtaining better knowledge graph
embeddings in order to improve topic modeling.
The second approach modifies the network of the
knowledge graph itself, and manages to significantly in-
crease the density of the network by adding syntactic depen-
dency relations between words in a sentence that are com-
puted from the same text corpus used for the topic model-
ing. This combination is performed by computing the depen-
dency trees of the sentences in the text corpus, and adding
each relation to the knowledge graph between the corre-
sponding entities, thus updating and enlarging the network.
It studies the knowledge encoded by this denser network in
terms of relations between entities, and how it affects the
overall performance of embeddings, and consecutively topic
modeling.
The paper is organized as follows: Section 2 presents the
related works. Section 3 contains the details of the employed
methods that are used in this paper. Section 4 describes the
source codes and implementations of the used methods. Sec-
tion 5 presents the results of the experiments and discusses
these outcomes. Section 6 concludes and draws some possi-
ble future work.
Figure 1: The representation of both KGE-LDA(a) and
Related Work KGE-LDA(b) models (Yao et al. 2017)
In this section, the existing works in the literature that are
discussed constitute the basis for the main focus and direc- LF-LDA, which stands for Latent Feature LDA, aims to
tion of this paper. KGE-LDA (Yao et al. 2017) is directly the improve topic modeling by incorporating latent feature vec-
baseline work about topic modeling with knowledge graph tors with a similar point of view as KGE-LDA. The differ-
embeddings that this paper is focused on. On the other hand, ence is that, apart from being published before KGE-LDA,
LF-LDA (Nguyen et al. 2015) is an older method that in- this paper obtains the latent feature representations directly
troduced the idea of using embeddings of words to improve from the text corpus itself. It uses the famous word2vec
topic modeling. The discussion of these methods is aimed (Mikolov et al. 2013) method to compute the embeddings
towards creating a general perspective for the main idea and on a large text corpus, to be used later on a smaller corpus
experiments that are proposed in this paper. for topic modeling. Its main contribution that is relevant to
KGE-LDA (Yao et al. 2017) is a knowledge-based topic this paper consists of using a large external data to compute
model that combines the well-known LDA model with entity the word embeddings. LF-LDA extends two topic models,
embeddings obtained from knowledge graphs. It proposes LDA (Blei, Ng, and Jordan 2003) and DMM (Nigam et al.
two topic models that incorporate the vector representa- 2000) by adding a latent feature component to the Dirichlet
tions of words, by obtaining them from the knowledge bases multinomial component that generates the words from top-
such as WordNet (Miller 1995) and Freebase (Bollacker et ics in each topic model (Nguyen et al. 2015). The extended
al. 2008). The two topic models are based on the previous methods are called LF-LDA and LF-DMM. The graphical
works CI-LDA (Newman, Chemudugunta, and Smyth 2006) representation of LF-LDA can be seen in the Figure 2.
and Corr-LDA (Blei and Jordan 2003). The contributions of
this paper create the foundations that this paper studies and
attempts to improve. In this paper, the topic models of KGE-
LDA are used. Their claim and results show that knowledge
encoded from the knowledge graphs capture the semantics
better than the compared methods. In order to handle the em-
beddings, they propose a Gibbs Sampling inference method.
KGE-LDA extends two entity topic models, namely CI-
LDA (Newman, Chemudugunta, and Smyth 2006) and Corr- Figure 2: Representation of the LF-LDA model (Nguyen et
LDA (Blei and Jordan 2003) in order to incorporate the al. 2015)
learned entity embeddings into the topic model. The model
based on CI-LDA is referred to as KGE-LDA(a) and the Improving Knowledge Graph Embeddings for
model based on Corr-LDA is referred to as KGE-LDA(b)
in the paper and also throughout this work. The details re- Topic Modeling
garding these approaches are discussed in the following sub- The focus of this paper is to explore the improvements
sections. The graphical representation of the models can be in knowledge graph embeddings and their effects in topic
seen in Figure 1. modeling performance. There are three explored dimensions
in the knowledge graph embedding process that are pre- by addition as explained in previous subsections (head +
sumed to have direct effect on performance. These dimen- relation = tail); DistMult composes vectors by weighted
sions are embedding method performance, the information element-wise dot product, in other words the following mul-
in the knowledge base, and the vector dimension of the em- tiplicative operation: head x relation = tail (Yang et al.
beddings. This chapter describes these dimensions and how 2014).
to explore them.
TransR TransR model attempts to tackle the problem that
Embedding Methods Application the same semantic space to model embeddings for all entities
and relations is insufficient (Lin et al. 2015b). Building on
The following models are chosen for running the experi- TransE and TransR; it build entity and relation embeddings
ments. TransE (Bordes et al. 2013) is the model used by the in seperate semantic spaces.
authors of KGE-LDA (Yao et al. 2017), whereas the follow-
ing models of the respective papers are chosen because they PTransE PTransE builds upon the previous methods by
are either directly or indirectly are compared with TransE utilizing multiple-step relation paths in the knowledge
and each other, which provides us with a better understand- graph. Their approach is similar to TransE, with the addi-
ing of the difference in their performance. tion of relation path-based learning (Lin et al. 2015a). In a
The mentioned papers improve the state-of-the-art in few simple words, they join consecutive relations in the path
knowledge graph embedding in their respective papers. The into a single relation such as relation1 ◦ relation2 = rela-
presumption is that the models which improve upon the re- tion path and use these paths in the model.
sults of TransE on other grounds such as Link Prediction,
should also deliver similar improvements in Topic Modeling HolE Short for “Holographic Embeddings”, the differ-
results. To create a comparison of equal grounds, all of these ence that this model adopts is the learning of the com-
models should be trained with the same dataset, the same positional vector space representation of entire knowledge
parameters, and produce an output of the same embedding graphs (Nickel et al. 2016). It uses correlation as the com-
dimension. By keeping all other variables same, it’s possi- positional operator. The results of HolE are compared to
ble to directly observe the quality of the embeddings for the TransE, TransR and other embedding methods in the pub-
purpose of topic modeling. The result of this approach helps lished paper.
determine the methods and the configurations which moves One interesting fact is that, HolE was proved to be equiv-
the state of the art further by producing the highest accuracy alent to another method called ComplEx (Trouillon et al.
in topic modeling. 2016), which was also published the same year (Hayashi and
The following subsections explain the main characteris- Shimbo 2017). Because of this fact, ComplEx was excluded
tics and differences of the compared embedding methods: in this work from the experimentation.
TransE TransE model represents the relations in the graph Analogy Analogy proposes the optimization of latent fea-
as translations in the embedding space (Bordes et al. 2013). ture representations with respect to the analogical properties
For example, in a triple (head, relation, tail); the vector arith- of the embeddings of both entities and relations (Liu, Wu,
metic equation head + relation = tail should hold true. In and Yang 2017). It also unifies several methods in multi-
this model, a null vector would represent the equivalence relational embedding which are DistMult (Yang et al. 2014),
of the head and tail entity. This also means that if the se- ComplEx (Trouillon et al. 2016) and HolE (Nickel et al.
mantics of the graph are captured correctly, the result of the 2016). It’s also compared to all previous methods mentioned
vector arithmetic vector(”France”) - vector(”Paris”) + vec- in this paper in the experiments of the published paper.
tor(”Rome”) should create a vector that is closest to the vec- In Table 1, the time and space complexities along with the
tor(”Italy”) in the knowledge graph (Mikolov et al. 2013), scoring functions of the described methods are compared.
with the assumption that the triples (Paris, capitalof, France)
and (Rome, capitalof, Italy) or similar semantic relations ex- Knowledge Graph Extension with Dependency
ist. Trees
As stated before, TransE is part of the baseline method While the previous two sections observe the effects of the
KGE-LDA that the following methods are compared to in embedding models and process; this section focuses on the
the experiments. density and quality of the knowledge graphs, which the em-
TransH TransH model, models relations as hyperplanes in bedding models are trained with.
addition to the translation operations as TransE does (Wang Therefore, as a source of new information for the knowl-
et al. 2014). The motive is the fact that there are proper- edge graph, the text corpus itself is a great answer. The de-
ties like reflexive, one-to-many, many-to-one and many-to- pendency relations in sentences constitute meaningful se-
many; and there is a need to represent these mapping prop- mantics, and a quite massive source of information. The
erties. Their claim is that TransE was not successful in pre- question that remains to be answered is the fact that are
serving these properties. semantic relations in a knowledge graph and a dependency
graph are compatible with each other? Are they able to cre-
DistMult This model also directly aims to improve on ate a richer knowledge base? Are the current embedding
TransE model, and the main difference is the composition of methods able to capture the information encoded in the re-
vectors. Different from TransE where vectors are composed sulting massive graph?
Table 1: Characteristics of the different Embedding Methods. Parameters: d: Embedding size, ne : Number of entities, nr :
Number of relations, h: head entity, r: relation, t: tail entity, wr : vector representation of r, p: path
Time Complexity Space Complexity Scoring Function
TransE O(d) O(ne d + nr d) −kh + r − tk1/2
TransH O(d) O(ne d + 2nr d) −k(h − wr> hwr ) + r − (t − wr> twr )k22
DistMult O(d) O(ne d + nr d) h> diag(r)t
PTransE O(d) O(ne d + nr d) −kp − (t − h)k
TransR O(d2 ) O(ne d + nr d + nr d2 ) −k(Mr h) + r − (Mr t)k22
HolE O(d log d) O(ne d + nr d) r> (h ? t)
Analogy O(d) O(ne d + nr d) h> M r t
To answer these questions, the Knowledge Graph used in sions offer more space to encode the semantics, but naturally
this paper (WN18) was merged with the Dependency Graph it comes with performance costs.
obtained by the 20NG text corpus which is also used in this Furthermore, the effects of topic number chosen for the
paper for topic modeling. As the details can be seen in the topic model also has a direct effect on the performance. Con-
Datasets subsection of the Experiments section; the density sidering the results in KGE-LDA (Yao et al. 2017), where
of the graph increased about 5 times, which surely created a the accuracy increases with topic number, a significantly in-
more complex semantic structure. creased topic number and its impact should be observed.
The general structure of the merging phase is illustrated Lastly, the extended knowledge graph method that was
in Figure 3. The process finds the dependency trees of each described in the previous section should also be examined
sentences. Then, the corresponding entity of each word in with the increased parameters as the information encoded
the knowledge graph is found. If the words and computed from a larger graph might even provide greater performance
dependencies pass the filtering stage, a new link is added with higher dimensional embeddings and higher topic num-
between the corresponding entities in the knowledge graph bers.
with the name of the dependency relation.
Implementation
Dependency Tree of a Sentence Base Topic Modeling Framework
word1
dependency_relation
To merge learned embeddings with the process of the topic
word3 word2
modeling, the original implementation of KGE-LDA by its
authors was used1 . The original implementation was cho-
word4
sen, because KGE-LDA is the baseline work that this paper
follows; thus it’s the best choice for running the experimen-
Extended
Knowledge Graph Knowledge Graph tations.
dependency_relation
The source code is structured as a Java project, and has a
dependency for the Stanford CoreNLP library. Along with
Synset("word1") Synset("word2") Synset("word1") Synset("word2") KGE-LDA, the project contains the implementations for
relation1 relation2 relation1 relation2 LDA (Blei, Ng, and Jordan 2003) and CTM (Blei and Laf-
Synset("word3") Synset("word3")
ferty 2006). Several alterations and additions were made
in the implementation for the third part of the experi-
ments(Knowledge Graph Extension). The additions are as
follows:
• Parsing 20NG dataset with the CoreNLP Dependency-
Figure 3: Visualization of the Knowledge Graph Extension
Parser and to obtain dependency trees.
• Updating the WN18 graph with the obtained dependen-
Further Exploration of Parameters cies.
This section aims to increase the primary parameters to mea- • Various minor alterations throughout the source code.
sure their effects on the final outcome. The motive is that
as long as computational limits and feasibility allow, better Embedding Methods
parameters and settings should be used if it provides consid- For the purpose of the experimentations for Embedding
erable improvements in the performance. In the light of this Method Comparison, the implementations of the chosen em-
motive, the following aspects are considered. bedding methods were needed. Therefore, implementations
The first aspect to be investigated is the effects of the em- of TransE, TransH, TransT and PTransE were taken from the
bedding dimension on the Topic modeling performance. The open-source project KB2E2 . The implementations of Dist-
motive for this aspect is the fact that the larger and denser the
1
knowledge graph or the dataset gets, it creates more infor- https://github.com/yao8839836/KGE-LDA
2
mation to be stored in the embeddings. Larger vector dimen- https://github.com/thunlp/KB2E/
Mult, HolE and Analogy were taken from the open-source used datasets along with the chosen parameters are stated for
project OpenKE3 . each of the different experiment sets.
The experiments are conducted to find answers to follow-
Dependency Parser ing questions:
For the purpose of the Knowledge Graph Extension part of 1. Are newer and improved embedding models able to cap-
this paper, Stanford CoreNLP DependencyParser Annotator ture better semantics for the purpose of topic modeling?
was used. Using DependencyParser, the code for the Knowl-
edge Graph Extension part was implemented in Java. The 2. How does the number of topics affect the performance of
process and the implementation follows this algorithm: these sets of methods?
3. Does a denser and more complex knowledge base create
Algorithm 1: Knowledge Graph Extension with Depen- a better or worse encoding of entities?
dency Trees 4. What is the importance of the vector dimensions in cap-
1 KnowledgeGraph ←− WN18; turing and encoding information? Do we need larger
2 DependencyNetwork ←− Empty Graph; vectors for more accurate representations for the used
3 for Document d in 20NG do datasets?
4 for Sentence s in d do
5 t ←− DepencencyParser(s); The experiments are grouped into three categories that
6 DependencyNetwork append t; each try to answer the corresponding questions stated above.
7 end We proceed with three sets of experiments: (1) Embedding
8 end Method Application and Comparison; (2) Knowledge Graph
9 KnowledgeGraph merge DependencyNetwork; Extension; (3) Further Exploration of Parameters.
10 return KnowledgeGraph;
Baselines
To visualize how the dependency relations are merged Two topic models are chosen to compare the results of ex-
with the knowledge graph, please refer to the Figure 4. periments with:
"Furthermore, sales of satellite
• LDA (Blei, Ng, and Jordan 2003)
ground equipment should go up in
the next revision of this data."
• KGE-LDA (Yao et al. 2017)
Dependency Parser LDA was chosen as the primary indicator of performance,
because it’s the most widely used topic model which is con-
sidered as the baseline method for many other works in the
Knowledge Graph
Equipment
Compound
Satellite field. KGE-LDA was chosen as the main indicator of perfor-
Extended Knowledge Graph
mance since it is the baseline method and starting point of
this paper work.
Compound
Equipment
Hyponym
Satellite Equipment Satellite
Datasets
Hyponym
Text Corpus The datasets in the context of this work refer
to the text corpus that is used to run the topic models. For
this purpose, 20-Newsgroups (20NG) dataset was used. The
dataset includes 18,846 documents, split into 20 categories,
with a vocabulary of 20,881 distinct words. In the text pre-
Figure 4: An Example of Merging a Dependency Relation
processing phase, the following steps are applied to the data:
from a Sentence with the Knowledge Graph
Tokenization (with Stanford CoreNLP), stopwords removal,
and rare words removal (for words that appear less than 10
The example in the Figure 4 shows how a dependency times throughout the dataset).
relation extracted from a sentence updates the knowledge
graph. In this specific example, there is a “Hyponym” re- External Knowledge The external knowledge refers to
lation from “Equipment” entity to “Satellite” entity in the the knowledge graph that was used to train the represen-
knowledge graph. The dependency parser finds out that tation learning methods to obtain the word embeddings.
these two words are used in a compound in the correspond- WN18, which is a subset of a widely used lexical knowledge
ing sentence, and updates the knowledge graph with the graph WordNet, was used for this purpose. WN18 has the
“Compound” relation. following characteristics in the training set: 141,442 triplets
(the missing 10,000 triplets of WN18 are in the test and val-
Experiments idation sets); 40,943 entities; 18 types of relations; 8,819
In this section, a series of experiments that involve differ- common entities with the 20NG vocabulary.
ent methods and variations of parameters are presented. The Table 2 shows the top 10 occurring relation types in the
knowledge graph, their occurring counts, and percentages in
3
https://github.com/thunlp/OpenKE size over the whole graph.
Table 2: Occurence Counts and Percentages of Top 10 Rela- Table 4: Embedding Methods Comparison Settings
tions in the Original WN18 Dataset
Parameter Name Parameter Value
Relation Count Percentage of Graph Embedding Dimension 50
Hyponym 34832 24.6% Gibbs Sampling Iterations 1000
Hypernym 34796 24.6% Learning Rate 0.001
Derivationally Related Form 29715 21.0% Hyperparameter α 50/K (#Topics)
Member Meronym 7402 5.23% Hyperparameter β 0.01
Member Holonym 7382 5.22% Number of Topics (K) 20, 30, 40, 50
Has part 4816 3.40%
Part of 4805 3.40%
Member of Domain Topic 3118 2.20%
Synset Domain Topic of 3116 2.20% Table 5: Further Exploration Experiment Settings
Instance Hyponym 2935 2.08%
Parameter Name Parameter Value
Embedding Dimension 100
Extended Knowledge Graph As mentioned before, the Topic Number 50, 100
Knowledge Graph in the previous subsection was merged
with the dependency graph obtained from the 20NG text
corpus. The resulting graph has the following characteris-
tics that have increased relative to the original knowledge
1. Settings for Embedding Methods Comparison: all param-
graph(WN18):
eters have been fixed, except for the number of topics (and
• 817,568 triplets, with respect to the original 141,442; the respective parameter α), as reported in Table 4;
• 55 types of relations, increased from the original 18.
There were new relations introduced to the knowledge 2. Settings for Knowledge Graph Extension: The settings are
graph, but no new entities. To demonstrate how the knowl- the same as the settings of Embedding Methods Compar-
edge graph changed, here are the top 10 occurring relation ison group.
types, their occurring counts, and percentages in size over
the whole graph: 3. Settings for Further Exploration of Parameters: with the
aim of delving into detailed investigation of the param-
eter values, a further set of experiment with new varia-
Table 3: Occurence Counts and Percentages of Top 10 Rela- tions of the settings have been launched, with values as
tions in the Extended WN18 Dataset reported in Table 5. With respect to the initial experiments
Relation Count Percentage of Graph (parametrized as in point 1 of this list), the embedding di-
mension is increased to 100 and the number of topics is
Root 30117 15.0% increased to 100.
Nominal Modifier 90531 11.1%
Compound 78654 9.62%
Direct Object 56423 6.90%
Adjectival Modifier 53819 6.58% Results
Dependent 35930 4.39%
Hyponym 34832 4.26%
Hypernym 34796 4.26% The results are obtained through two different evaluation
Conjunct 33223 4.06% mechanisms, namely Topic Coherence and Document Clas-
Auxiliary 30775 3.76% sification. UCI method which uses Pointwise mutual infor-
mation (Newman et al. 2010) was used for Topic Model-
ing, and LIBLINEAR linear classification library (Fan et al.
It can be seen that the structure of the knowledge graph 2008) was used for Document Classification. In the rest of
has changed substantially, with the high number of addi- the subsection, these results will be presented and discussed.
tions. With the extension, the size of the graph grew by
578% compared to the original knowledge graph, and 37
new relation types were added. Embedding Methods Comparison
Settings
A set of settings of the different parameters have been de- Topic Coherence Results As stated before PMI based
fined for the execution and validation of the approach. Some topic coherence was used to obtain these results. To compute
parameters have been adopted with a constant value across PMI, a dataset of 4,776,093 Wikipedia articles were used.
the experiments, while others have been varying across ex- For each method and topic, the results were run 5 times, after
periments. The settings considered include: which the average and the standard deviation was calculated.
The results can be found in Table 6.
77 77 77 77
75 75 75 75
73 73 73 73
72.3 71 71
71 71
70.6 70.6 70.3 70.3 70.3 70.4
69 70.1 69.8 70.2 69 69.9 69.8
69 69 69.5 69.5 69.6
69.2 69 69
68.8 68.7 68.8 68.8 68.8 68.7
67 67 68 67 67.8 67
67.5
66.7
65 65 65 65
63 63 63 63
K = 20 K = 30 K = 20 K = 30
TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
TransR HolE Analogy TransR HolE Analogy TransR HolE Analogy TransR HolE Analogy
20 Topics 30 Topics 20 Topics 30 Topics
77 77 77 77
75 75 75 75
73 73 73 73
72.6
71 71 71 71
71.4 71.3
70.7 71 71 70.9
70.4 70.3 70.6 70.6 70.5 70.6
69 70.2 70.3 70.4 70.2 70.4
69.6 69.7 69.6 69.5 69 69 70 70 69 69.8
69.4 69.4
67 68.4
67 67 67
65 65 65 65
63 63 63 63
K = 40 K = 50 K = 40 K = 50
TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
TransR HolE Analogy TransR HolE Analogy TransR HolE Analogy TransR HolE Analogy
40 Topics 50 Topics 40 Topics 50 Topics
Figure 5: Topic Coherence Scores of Topic Modeling Ob- Figure 6: Topic Coherence Scores of Topic Modeling Ob-
tained Through Different Embedding Methods with the In- tained Through Different Embedding Methods with the In-
corporation Model A, Separated by Topic Number corporation Model B, Separated by Topic Number
Table 6: Topic Coherence Results of Embedding Methods see TransR performing better than other methods. It is also
on Topic Modeling. The best results are reported in bold. worth mentioning that TransR performs best with the topic
number of 20 than higher numbers, and performs worst on
Model K = 20 K = 30 K = 40 K = 50
30 topics. With 20 topics, the standard deviation seems to be
LDA 68.4 ± 2.63 72.5±1.87 70.9±1.74 71.6±0.45 higher than higher topic numbers with the best(TransR) and
TransE(a) 68.8±3.56 70.6±2.08 69.6 ±1.13 71.4±1.82 worst(TransH) scores of all the combinations.
TransE(b) 70.2±1.79 70.3±0.52 70 ±1.56 71±1.41
TransH(a) 67.5±2.4 70.1±1.4 69.7 ±0.99 70.3±1.37
TransH(b) 68.8±1.25 69.5±1.51 70.5 ±0.8 70.4±0.35
DistMult(a) 68.7±1.97 69.8±0.95 70.7±1.34 71.3±1.62 Model B With Model B, there is also a general trend of im-
DistMult(b) 67.8±2.11 69.9±2.25 70 ±0.79 70.2±1.8 provement with topic number. The standard deviation in the
PTransE RNN(a) 70.6 3.1 69±1.66 69.6 ±1.81 70.6±1.93
general trend also gets smaller with increasing topic number.
PTransE RNN(b) 68.8±3.21 69.8±1.8 70.3 ±1.92 69.8±0.96
TransR scores the lowest on 20 topics, even though it scored
TransR(a) 72.3 ±2.41 68±1.27 69.5 ±0.95 70.6±1.79
TransR(b) 66.7±2.09 69.5±1.84 68.4 ±1.59 70.9±0.54
the highest on 20 topics with Model A. The highest score
HolE(a) 69.2±3.51 70.3±1.78 70.4 ±1.21 70.2±2.14 combination is Analogy method with 50 topics.
HolE(b) 68.8±1.23 69.6±2.33 70.6 ±2.2 70.4±1.78
Analogy(a) 69±1.88 70.3±2.5 69.4 ±2.1 71±0.52 Document Classification Results The documents have
Analogy(b) 68.7±2.25 70.4±1.24 69.4 ±2.44 72.6±1.41 been classified using LIBLINEAR (Fan et al. 2008). For
each method and topic, the results were run 5 times. The
average and the standard deviation are reported in the Table
Overall Topic Coherence Results The best and second 7 for each method and topic number.
coherence scores for each topic number are different, and it
should be noted that the performance of the original LDA
is consistently good. TransR leads to more coherent topics
with lower topic numbers, and Analogy performs best with Overall Document Classification Results The Table 7
higher topic numbers. The general trend shows improvement show that in overall results with topic numbers 20,30,40 and
with higher topic numbers. 50; HolE and Analogy perform the best. Also on average,
Model A results in slightly better scores than Model B; even
though Analogy performs better with Model B. Another ob-
Model A on Topic Coherence For 30,40 and 50 topics servation is that, performance almost always increases with
the topic coherence results are close and in the same range topic number; noting that with 40 and 50 topics, the results
with each other. The only significant visual difference in co- are closer to each other than with other topic number incre-
herence can be observed with topic number 20 where we ments.
tency in their results.
Table 7: Classification Results of Embedding Methods on
Topic Modeling. The best results are reported in bold.
Model K = 20 K = 30 K = 40 K = 50 0.750 0.750
0.725 0.725
LDA 0.539± 0.028 0.633±0.022 0.695 ±0.022 0.69±0.022 0.700 0.700
TransE(a) 0.57± 0.024 0.677±0.013 0.705 ±0.011 0.694±0.017 0.675 0.675 0.687
0.670 0.666 0.667 0.668
TransE(b) 0.554±0.017 0.670±0.017 0.676 ±0.022 0.714±0.006 0.650 0.650 0.659 0.662
0.625 0.625
TransH(a) 0.567± 0.032 0.668± 0.027 0.71 ±0.019 0.714±0.009
0.600 0.600
TransH(b) 0.555±0.014 0.666±0.035 0.694 ±0.013 0.697±0.024 0.575
0.575 0.587
DistMult(a) 0.59±0.021 0.644±0.015 0.706±0.019 0.702±0.026 0.550
0.576
0.563 0.550
0.554 0.555 0.555 0.554
DistMult(b) 0.587±0.017 0.667±0.014 0.687 ±0.014 0.694±0.025 0.525 0.525
PTransE RNN(a) 0.567±0.024 0.667±0.024 0.701 ±0.012 0.709±0.010 0.500 0.500
K = 20 K = 30
PTransE RNN(b) 0.576±0.016 0.659±0.015 0.684 ±0.024 0.701±0.021 TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
TransR(a) 0.574±0.012 0.656±0.018 0.687 ± 0.022 0.716±0.011 TransR HolE Analogy TransR HolE Analogy
TransR(b) 0.555±0.035 0.662±0.022 0.692 ±0.005 0.695±0.026 20 Topics 30 Topics
HolE(a) 0.597±0.032 0.679±0.032 0.697 ±0.021 0.707±0.004
0.750 0.750
HolE(b) 0.563±0.022 0.668±0.034 0.684 ±0.026 0.713±0.017
0.725 0.725
Analogy(a) 0.579±0.014 0.641±0.037 0.704 ±0.022 0.715±0.009 0.700 0.700 0.714 0.713
0.719
0.701
Analogy(b) 0.554±0.004 0.687±0.017 0.676 ±0.022 0.719±0.006 0.675 0.694
0.687 0.684
0.692
0.684 0.675
0.697 0.694 0.695
0.676 0.676
0.650 0.650
0.625 0.625
0.600 0.600
0.750 0.750 0.575 0.575
0.725 0.725 0.550 0.550
0.700 0.700 0.525 0.525
0.675 0.675 0.500 0.500
0.677 0.679
0.668 0.667 K = 40 K = 50
0.650 0.650
0.656
0.644 0.641 TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
0.625 0.625
TransR HolE Analogy TransR HolE Analogy
0.600 0.600
0.575 0.590
0.574
0.597
0.579
0.575 40 Topics 50 Topics
0.570 0.567 0.567 0.550
0.550
0.525 0.525
0.500 0.500
K = 20 K = 30
TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
Figure 8: Document Classification Accuracy of Topic Mod-
TransR HolE Analogy TransR HolE Analogy
eling Obtained Through Different Embedding Methods with
20 Topics 30 Topics the Incorporation Model B, Separated by Topic Number
0.750 0.750
0.725 0.725
0.700 0.710 0.700 0.714 0.716 0.715
0.705 0.706 0.704 0.709 0.707
0.701 0.697 0.702
0.675 0.675 0.694
0.687
0.650 0.650
0.625 0.625
0.600 0.600
0.575 0.575
Model B The main difference of Model B generates the
0.550 0.550 entity embeddings by topics in the same document, so it’s
0.525 0.525
0.500 0.500
important to state that the embeddings of the best methods
K = 40 K = 50
TransE TransH DistMult PTransE RNN TransE TransH DistMult PTransE RNN
are a better fit for this approach.
TransR HolE Analogy TransR HolE Analogy
40 Topics 50 Topics The results of Model B reveal that in 20 topics, DistMult
is the best performer along with PTransE RNN. In 30 top-
ics, Analogy outperforms others, as all the other methods
Figure 7: Document Classification Accuracy of Topic Mod- score similar to each other. In 40 topics, TransH and TransR
eling Obtained Through Different Embedding Methods with score better than others by a landslide. In 50 topics, Analogy
the Incorporation Model A, Separated by Topic Number seems to outperform others with TransE and HolE scoring
close.
Model A The results of Model A show that in 20 topics, The outcomes show that Analogy and DistMult are the
HolE and DistMult perform the best. Their approach appar- best performers overall with Model B. It’s also important to
ently is better for small number of topics. Analogy also per- note that Analogy gives more consistent results with multi-
forms close to them. In 30 topics, the results show that HolE ple runs, which can be seen in lower standard deviation than
again scores best. However, this time DistMult scores low, other methods.
and TransE, TransH and TransR which employ an addition
based translation score better. In 40 topics, the performance
of all methods converge, with all of them scoring more sim- Knowledge Graph Extension
ilarly than they do in other topic numbers. 50 topic results
are also relatively similar, with TransR, Analogy and TransH
scoring best. Topic Coherence Results The Topic Coherence experi-
The outcomes show that HolE is the best performer over- ments were run according to the parameters specified be-
all with Model A. Looking at the standard deviations, it fore. Each experiment was run 5 times, with averages and
seems that with Model A; the methods have similar consis- standard deviations reported in the Table 8.
Document Classification with Knowledge Graph Exten-
Table 8: Topic Coherence Results with Knowledge Graph sion Overview Results in Table 9 show that the knowledge
Extension. The best results are reported in bold. graph extension created better semantics in the graph which
K = 20 K = 30 K = 40 K = 50 in turn reflected to the classification results. We see an over-
Orig. K.G. (a) 68.8±3.56 70.6±2.08 69.6±1.13 71.4±1.82 all improvement with both Model A and Model B, whereas
Orig. K.G. (b) 70.2±1.79 70.3±0.52 70±1.56 71±1.41 improvements with Model A are larger. Extended Graph
Ext. K.G. (a) 70.5±2.44 69.5±1.08 70.4±1.44 70.7±2.56 with Model A performs better with smaller topic numbers,
Ext. K.G. (b) 68.1±0.48 70.1±2.13 71.5±3.01 71.3±0.7
where as the extended graph with Model B is more accurate
on larger topic numbers.
Topic Coherence with Knowledge Graph Extension 0.750
Overview The results in Table 8 show that the Extended
Knowledge Graph led to similar results with the Original
Knowledge Graph. With an overall inspection of the table, 0.700
it can be seen that the best performance are distributed to
different models and graphs. The version with the Extended 0.650
Knowledge Graph provided better average scores for 20 top-
ics and 40 topics. Also, the overall trend is similar to the
topic coherence results of the previous section, as 30, 40 and 0.600
50 topics resulted in the same range of performance with
each other.
0.550
77
75 0.500
K = 20 K = 30 K = 40 K = 50
73 Original KG (a) Original KG (b)
Extended KG (a) Extended KG (b)
71
69
Figure 10: Document Classification Accuracies of Topic
Modeling Obtained Through Original Knowledge Graph
67 and Extended Knowledge Graph, on Different Topic Num-
bers
65
Increased Topic Number and Embedding Dimension
63 The experiment in this section corresponds to the previous
K = 20 K = 30 K = 40 K = 50
subsections about further exploration of parameters. For this
Original KG(a) Original KG(b) purpose, an increased topic number of 100 and an increased
Extended KG (a) Extended KG (b) embedding dimension of 100 was used with TransE and
Analogy on the original knowledge graph, and furthermore
TransE on Extended Knowledge Graph.
Figure 9: Topic Coherence Scores of Topic Modeling Ob- The average and standard deviations obtained from 5 runs
tained Through Original Knowledge Graph and Extended of each combinations are reported in Tables 10 and 11.
Knowledge Graph, On Different Topic Numbers
Table 10: Topic Coherence Results with 100 Dimensional
Document Classification Results The experiments in this Embeddings. The best results are reported in bold.
section were also run 5 times as the ones before. The aver-
ages with the standard deviations are reported in Table 9. K = 50 K = 100
TransE on Orig. K.G. (a) 70.1±1.1 73.4±1.71
TransE on Orig. K.G. (b) 71.5±1.89 73.5±1.11
Table 9: Document Classification Results with Knowledge Analogy on Orig. K.G. (a) 69.4±0.82 72.7±1.73
Graph Extension. The best results are reported in bold. Analogy on Orig. K.G. (b) 70±0.81 75.1±2.21
TransE on Ext. K.G. (a) 70.1±1.44 72.7±0.8
K = 20 K = 30 K = 40 K = 50 TransE on Ext. K.G. (b) 71.5±0.19 73.4±0.84
Orig. K.G. (a) 0.57±0.024 0.677±0.013 0.705±0.011 0.694±0.017
Orig. K.G. (b) 0.554±0.017 0.670±0.017 0.676±0.022 0.714±0.006
According to the Topic Coherence scores, the extended
Ext. K.G. (a) 0.582±0.017 0.683±0.032 0.692±0.010 0.711±0.027
knowledge graph provides a better performance on 50 topics
Ext. K.G. (b) 0.566±0.015 0.656±0.014 0.695±0.018 0.716±0.010
than both TransE and Analogy on the original graph. Even
0.75
though it scores the equal as the same configuration with
Original Knowledge Graph, its standard deviation is 90%
0.7
lower. On 100 topics, Analogy with Model B stands out with
the highest coherence score that was obtained throughout
the experiments of this paper work by scoring 2.18% higher 0.65
than the closest coherence score. Figure 11 offers a clear
comparison of these results in a visual way. 0.6
77 0.55
75
0.5
73 K = 50 K = 100
71 TransE on Original KG (a) TransE on Original KG (b)
Analogy on Original KG (a) Analogy on Original KG (b)
69
TransE on Extended KG (a) TransE on Extended KG (b)
67
Figure 12: Document Classification Accuracy of Topic
65
Modeling Obtained Through Specified Method and Knowl-
63 edge Graph Combinations, with 100 Dimensional Embed-
K = 50 K = 100 dings on Different Topic Numbers
TransE on Original KG (a) TransE on Original KG (b)
Analogy on Original KG (a) Analogy on Original KG (b)
Runtime Duration
TransE on Extended KG (a) TransE on Extended KG (b)
The experiments were conducted on a computer with the fol-
lowing relevant technical specifications:
Figure 11: Topic Coherence Scores of Topic Modeling Ob-
tained Through Specified Method and Knowledge Graph • Intel Core i5-8250U CPU @ 1.60GHz
Combinations, with 100 Dimensional Embeddings on Dif- • 8 GB of DDR4 RAM @ 1866 MHz
ferent Topic Numbers
Throughout the experiments, the elapsed execution time
was measured. Embedding methods were run only once to
obtain the representations from the knowledge graph. The
Table 11: Document Classification Results with 100 Dimen- fastest embedding happened to be TransE with approxi-
sional Embeddings. The best results are reported in bold. mately 1 hour of computation, and the slowest was HolE
with approximately 17 hours of computation. All of other
K = 50 K = 100 methods ran for a duration between 1 hour and 2 hours. It
TransE on Orig. K.G. (a) 0.712±0.020 0.725±0.009 is safe to say that HolE was exceptionally slow during the
TransE on Orig. K.G. (b) 0.705 ±0.009 0.724±0.006 training phase compared to other methods.
Analogy on Orig. K.G. (a) 0.711 ±0.010 0.73±0.010 The more crucial and overall time consuming part was
Analogy on Orig. K.G. (b) 0.706 ±0.010 0.727±0.010 running the topic models with the obtained representations.
TransE on Ext. K.G. (a) 0.712 ±0.011 0.734±0.002
TransE on Ext. K.G. (b) 0.693 ±0.019 0.726±0.013
The duration of topic modeling phase was not affected by
the representations obtained by different methods, as they
all provide an output of the same size. However, the topic
The extended knowledge graph scores the highest Docu- number and embedding size had a significant effect on the
ment Classification accuracy for both 50 topics and 100 top- execution time. The average durations are reported in two
ics with Model A. In fact, the Extended Graph with Model separate tables. For embedding size of 50 the results can be
A on 100 topics scored the highest accuracy for Docu- seen in Table 12 and for embedding size of 100 the results
ment Classification throughout the experiments of this paper can be seen in Table 13.
by scoring 1.24% higher than the same configuration with
the Original Knowledge Graph. On 50 topics, it scored the
same average with the Original Knowledge Graph but with Table 12: Average execution time of Topic Modeling with
a smaller standard deviation. According to these results, the 50-dimensional embeddings (in minutes) depending on the
Extended Knowledge Graph leads to better accuracy than number of topics K.
the Original Knowledge Graph with the exception of 50 top- K = 20 K = 30 K = 40 K = 50
ics with Model B. It also performs better than Analogy with
Model A 133 142 164 189
Model A. These results can also be clearly seen in Figure
Model B 121 145 162 216
12.
eral than 30 topics. The usage of different embedding meth-
Table 13: Average execution time of Topic Modeling with ods create topic coherence results that are in 1.2% range
100-dimensional embeddings (in minutes) depending on the of each other on average. Analogy with Model B leads to
number of topics K. the highest coherence scores with high topic numbers. The
K = 50 K = 100 extended knowledge graph clearly improved the Document
Classification accuracy with the exception of 40 topics. The
Model A 217 340
Model B 209 314
improvements on Topic Coherence is on 20 and 40 topics.
For a general purpose use, Analogy is a clear choice over
DistMult and HolE. The first reason is the fact that Analogy
To more clearly interpret the execution times, Figure 13 is a generalized method which can reproduce DistMult and
provides a visual representation. It can be seen in the fig- HolE with a selection of parameters; it allows a higher range
ure on K = 50 that the runtime duration decreases by 3.3% of performance and parameters. This should allow a grid
with the embedding size with Model B, and increases 14.5% search to find a configuration which is better than DistMult
with Model A from 50 dimensional embeddings to 100 di- and HolE. The second reason is the fact that even though
mensional embeddings. HolE and Analogy with the same parameters perform quite
However an increase from 50 topics to 100 topics in- similar to each other, it takes much longer to train HolE (∼
creases runtime duration by 79.9% with Model A and 45.4% 17 hours) compared to Analogy(∼ 1-2 hours). A much faster
with Model B. Considering these facts with the general trend training with theoretically being able to produce the same
of growth in the figure; it is safe to say that topic number has results as HolE, makes Analogy more feasible.
a larger impact on runtime duration than the embedding size The Document classification Evaluation produced results
during topic modeling. that are clearer and easier to interpret in general. With small
exceptions, increased topic number produced better results.
400 In the embedding method comparison section, it can be seen
350
that some of the newer and more complex embedding meth-
ods like DistMult, HolE and Analogy led to higher classifi-
300 cation accuracy. Model A seems to be on average 1% better
than Model B, but they produce equally consistent results
250
with the same standard deviation at 1.9% on average.
200 On the other hand, there are clear improvements in the
accuracy of the document classification when the Extended
150 Knowledge Graph was used to train the embedding meth-
100
ods. This means that the semantic structure of the knowl-
K = 20 K = 30 K = 40 K = 50 K = 100 edge graph was enhanced, which reflected into better vector
50 Dimensions Model A 50 Dimensions Model B representations of entities and relations.
100 Dimensions Model A 100 Dimensions Model B In the last group of experiments, the Extended Knowledge
Graph provides better results than TransE and Analogy on
Figure 13: Execution times (in minutes). the Original Knowledge Graph with an accuracy of 0.734 ±
0.002 which is the highest accuracy recorded throughout the
experiments in this paper work.
Discussion
In light of these outcomes, the following inferences are
The results on topic coherence throughout the three exper- made for the effects of embedding methods on document
iments share a similar pattern. From 30 topics to upwards, classification. The accuracy consistently increases with topic
the scores are really similar with the consideration of stan- number. Changes on the embedding method performance re-
dard deviation, with a few results having significant differ- flects on the document classification accuracy. Analogy with
ence. These scores also do not vary much between differ- Model B leads to the highest accuracy scores on high topic
ent methods both for the incorporation models A and B. For numbers. The extended knowledge graph led to increased
the results in 20 topics, we have larger difference between accuracy, and showed that dependency trees enhanced the
methods. With the increased parameters of 100 topics and semantics of the knowledge graph.
100 dimensional embeddings, the highest score achieved is
75.1±2.21 by Analogy, which scored 72.6±1.41 with 50
topics and 50 dimensional embeddings. Topic Coherence
Conclusion
with 100 topics shows that Analogy with Model B config- This paper explored the incorporation of knowledge graph
uration proves to be successful also on higher dimensional embeddings into topic modeling, by experimenting on var-
embedding and higher topic numbers. ious aspects and identifying the ways for improvements.
Therefore, some inferences can be made for the effects These aspects were the semantic information in the source
of embedding methods on topic coherence: The coherence knowledge graph, different embedding methods, perfor-
increases with topic number on average, but inconsistently. mance effects of topic numbers and embedding dimensions.
This means that a general trend of increase is seen, except on performance of 7 embedding methods, 2 topic models, 2
40 topics which resulted in lower coherence scores in gen- variations of the knowledge base and various parameters
have been explored in the context of Topic Modeling. 2 eval- [Blei and Lafferty 2006] Blei, D., and Lafferty, J. 2006. Cor-
uation methods, namely Topic Coherence and Document related topic models. Advances in neural information pro-
Classification, have been used to measure the success of cessing systems 18:147.
the experimentations. In the light of these results, this pa- [Blei, Ng, and Jordan 2003] Blei, D. M.; Ng, A. Y.; and Jor-
per work has made several contributions. dan, M. I. 2003. Latent dirichlet allocation. Journal of
On Embedding Methods Comparison, Topic Coherence machine Learning research 3(Jan):993–1022.
and Document Classification yields different performance
[Bollacker et al. 2008] Bollacker, K.; Evans, C.; Paritosh, P.;
by each method, but the results have similarities. The most
Sturge, T.; and Taylor, J. 2008. Freebase: a collaboratively
obvious pattern is the performance of Analogy. It outper-
created graph database for structuring human knowledge. In
forms all other methods on higher topic numbers with Model
Proceedings of the 2008 ACM SIGMOD international con-
B. For lower topic numbers, simpler methods like TransE
ference on Management of data, 1247–1250. AcM.
and TransR produce the best results. Overall, the best aver-
age scores come from HolE. [Bordes et al. 2013] Bordes, A.; Usunier, N.; Garcia-Duran,
The Knowledge Graph Extension scores similar results A.; Weston, J.; and Yakhnenko, O. 2013. Translating em-
to the original graph on Topic Coherence, but on Docu- beddings for modeling multi-relational data. In Advances in
ment Classification it clearly improves the accuracy. With neural information processing systems, 2787–2795.
increased parameters and embedding dimension, the im- [Cao et al. 2015] Cao, Z.; Li, S.; Liu, Y.; Li, W.; and Ji, H.
provements of the Knowledge Graph Extension are clearer, 2015. A novel neural topic model and its supervised exten-
especially in Document Classification. sion. In AAAI, 2210–2216.
The best performing embedding method Analogy with [Fan et al. 2008] Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.;
Model B achieves an average improvement of 0.50% over Wang, X.-R.; and Lin, C.-J. 2008. Liblinear: A library for
the baseline method (KGE-LDA using TransE) in Topic Co- large linear classification. Journal of machine learning re-
herence, and an average improvement of 1.01% over the search 9(Aug):1871–1874.
baseline method (KGE-LDA using TransE) in Document [Hayashi and Shimbo 2017] Hayashi, K., and Shimbo, M.
Classification. The Knowledge Graph Extension achieves an 2017. On the equivalence of holographic and com-
average improvement of 0.52% over the Original Knowl- plex embeddings for link prediction. arXiv preprint
edge Graph in Topic Coherence, and an average improve- arXiv:1702.05563.
ment of 0.77% over the Original Knowledge Graph in Doc-
ument Classification. [Hinton and Salakhutdinov 2009] Hinton, G. E., and
As the closing remark, the best embedding method, incor- Salakhutdinov, R. R. 2009. Replicated softmax: an
poration model and parameter combination is Analogy with undirected topic model. In Advances in neural information
Model B on high topic numbers, with high embedding di- processing systems, 1607–1614.
mension. The extension of the knowledge base along with [Joulin et al. 2016] Joulin, A.; Grave, E.; Bojanowski, P.; and
high embedding dimension enables more information to be Mikolov, T. 2016. Bag of tricks for efficient text classifica-
encoded into the vectors, which in turn creates a more ac- tion. CoRR abs/1607.01759.
curate representation of the entities compared to the Orig- [Lin et al. 2015a] Lin, Y.; Liu, Z.; Luan, H.; Sun, M.; Rao,
inal Knowledge Graph. This performance improvement of S.; and Liu, S. 2015a. Modeling relation paths for rep-
the Extended Knowledge Graph comes with a 578% growth resentation learning of knowledge bases. arXiv preprint
in the size of the graph. arXiv:1506.00379.
It has been shown that Analogy is the most optimal em- [Lin et al. 2015b] Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu,
bedding method Secondly, the results clearly show that the X. 2015b. Learning entity and relation embeddings for
Extended Knowledge Graph has improved both Topic Co- knowledge graph completion. In AAAI, volume 15, 2181–
herence score and Document Classification accuracy. 2187.
Deeper investigations on a few points can provide further
[Liu, Wu, and Yang 2017] Liu, H.; Wu, Y.; and Yang, Y.
improvements on the solution. For the embedding method
2017. Analogical inference for multi-relational embeddings.
comparison part, the different methods have been tested with
arXiv preprint arXiv:1705.02426.
the same parameters. This provided an equal ground for the
methods to compete with each other. However, a compre- [Mikolov et al. 2013] Mikolov, T.; Chen, K.; Corrado, G.;
hensive parameter grid search for each embedding method Dean, J.; Sutskever, L.; and Zweig, G. 2013. word2vec.
can increase their performance and reveal more realistic val- URL https://code. google. com/p/word2vec.
ues. Finally, as the specific knowledge graph extension in [Miller 1995] Miller, G. A. 1995. Wordnet: a lexi-
the experiments yielded better results, there can be further cal database for english. Communications of the ACM
exploration on the knowledge graph capabilities. 38(11):39–41.
[Newman et al. 2010] Newman, D.; Lau, J. H.; Grieser, K.;
References and Baldwin, T. 2010. Automatic evaluation of topic coher-
[Blei and Jordan 2003] Blei, D. M., and Jordan, M. I. 2003. ence. In Human Language Technologies: The 2010 Annual
Modeling annotated data. In Proceedings of the 26th an- Conference of the North American Chapter of the Associa-
nual international ACM SIGIR conference on Research and tion for Computational Linguistics, 100–108. Association
development in informaion retrieval, 127–134. ACM. for Computational Linguistics.
[Newman, Chemudugunta, and Smyth 2006] Newman, D.; In Proceedings of the 51st Annual Meeting of the Associa-
Chemudugunta, C.; and Smyth, P. 2006. Statistical entity- tion for Computational Linguistics (Volume 1: Long Papers),
topic models. In Proceedings of the 12th ACM SIGKDD volume 1, 455–465.
international conference on Knowledge discovery and data [Socher et al. 2013b] Socher, R.; Perelygin, A.; Wu, J.;
mining, 680–686. ACM. Chuang, J.; Manning, C. D.; Ng, A.; and Potts, C. 2013b.
[Nguyen et al. 2015] Nguyen, D. Q.; Billingsley, R.; Du, L.; Recursive deep models for semantic compositionality over
and Johnson, M. 2015. Improving topic models with latent a sentiment treebank. In Proceedings of the 2013 confer-
feature word representations. Transactions of the Associa- ence on empirical methods in natural language processing,
tion for Computational Linguistics 3:299–313. 1631–1642.
[Nickel et al. 2016] Nickel, M.; Rosasco, L.; Poggio, T. A.; [Srivastava, Salakhutdinov, and Hinton 2013] Srivastava, N.;
et al. 2016. Holographic embeddings of knowledge graphs. Salakhutdinov, R.; and Hinton, G. 2013. Fast inference
In AAAI, volume 2, 3–2. and learning for modeling documents with a deep boltzmann
[Nigam et al. 2000] Nigam, K.; McCallum, A. K.; Thrun, S.; machine.
and Mitchell, T. 2000. Text classification from labeled and [Trouillon et al. 2016] Trouillon, T.; Welbl, J.; Riedel, S.;
unlabeled documents using em. Machine learning 39(2- Gaussier, É.; and Bouchard, G. 2016. Complex embeddings
3):103–134. for simple link prediction. In International Conference on
[Pennington, Socher, and Manning 2014] Pennington, J.; Machine Learning, 2071–2080.
Socher, R.; and Manning, C. D. 2014. Glove: Global [Wang et al. 2014] Wang, Z.; Zhang, J.; Feng, J.; and Chen,
vectors for word representation. In Empirical Methods in Z. 2014. Knowledge graph embedding by translating on
Natural Language Processing (EMNLP), 1532–1543. hyperplanes. In AAAI, volume 14, 1112–1119.
[Řehůřek and Sojka 2010] Řehůřek, R., and Sojka, P. 2010. [Yang et al. 2014] Yang, B.; Yih, W.-t.; He, X.; Gao, J.; and
Software Framework for Topic Modelling with Large Cor- Deng, L. 2014. Embedding entities and relations for
pora. In Proceedings of the LREC 2010 Workshop on learning and inference in knowledge bases. arXiv preprint
New Challenges for NLP Frameworks, 45–50. Valletta, arXiv:1412.6575.
Malta: ELRA. http://is.muni.cz/publication/ [Yao et al. 2017] Yao, L.; Zhang, Y.; Wei, B.; Jin, Z.; Zhang,
884893/en. R.; Zhang, Y.; and Chen, Q. 2017. Incorporating knowl-
[Socher et al. 2013a] Socher, R.; Bauer, J.; Manning, C. D.; edge graph embeddings into topic modeling. In AAAI, 3119–
et al. 2013a. Parsing with compositional vector grammars. 3126.