=Paper=
{{Paper
|id=Vol-2350/paper11
|storemode=property
|title=Improving Topic Modeling for Textual Content with Knowledge Graph Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2350/paper11.pdf
|volume=Vol-2350
|authors=Marco Brambilla,Birant Altinel
|dblpUrl=https://dblp.org/rec/conf/aaaiss/BrambillaA19
}}
==Improving Topic Modeling for Textual Content with Knowledge Graph Embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2350/paper11.pdf</pdf>
<pre>
                              Improving Topic Modeling for Textual Content
                                  with Knowledge Graph Embeddings

                                               Marco Brambilla, Birant Altinel
                                                    Politecnico di Milano, DEIB
                                         Piazza Leonardo da Vinci, 32. I-20133 Milano, Italy
                                                  {firstname.lastname}@polimi.it


                            Abstract                                 al. 2013b), and it also is being used in the Topic Modeling
                                                                     field (Hinton and Salakhutdinov 2009)(Srivastava, Salakhut-
  Topic modeling techniques has been applied in many scenar-
  ios in recent years, spanning textual content, as well as many     dinov, and Hinton 2013)(Cao et al. 2015)(Nguyen et al.
  different data sources. The existing researches in this field      2015)(Yao et al. 2017). One of these papers with a method
  continuously try to improve the accuracy and coherence of          called KGE-LDA (Yao et al. 2017) aims to improve the per-
  the results. Some recent works propose new methods that cap-       formance of topic modeling by obtaining the vector repre-
  ture the semantic relations between words into the topic mod-      sentations of words from external knowledge bases such as
  eling process, by employing vector embeddings over knowl-          WordNet (Miller 1995) and FreeBase (Bollacker et al. 2008)
  edge bases. In this paper we study various dimensions of how       instead of learning them from documents. According to their
  knowledge graph embeddings affect topic modeling perfor-           reported results, this approach is successful and improves
  mance on textual content. In particular, the objective of the      the topic coherence by 9.5% to 44% and document classi-
  work is to determine which aspects of knowledge graph em-
                                                                     fication accuracy by 1.6% to 5.4% compared to LDA (Blei,
  bedding have a significant and positive impact on the accu-
  racy of the extracted topics. In order to obtain a good under-     Ng, and Jordan 2003).
  standing of the impact, all steps of the process are examined          Their approach improves the results with one specific
  and various parameterization of the techniques are explored.       method to obtain the word representations, but it’s not clear
  Based on the findings, we improve the state of the art with the    whether vectors obtained through other methods that can
  use of more advanced embedding approaches and parameter-           capture better semantics of networks are able to boost the
  izations that produce higher quality topics. The work also in-     accuracy of topic modeling. The vector embedding methods
  clude a set of experiments with 2 variations of the knowledge      that have proven to be more successful in other fields such
  base, 7 embedding methods, and 2 methods for incorporation         as Link Prediction can possibly capture the semantics of the
  of the embeddings into the topic modeling framework, also
  considering a set of variations of topic number and embed-
                                                                     external knowledge base more accurately.
  ding dimensionality.                                                   Another question that remains to be answered in this con-
                                                                     text is whether a larger knowledge base in terms of entities
                                                                     or a denser knowledge base in terms of relations between en-
                        Introduction                                 tities can also contribute to better representations of words.
In the current age of information, larger and larger amounts         The primary motive to this question lies in the fact that the
of data are generated and collected every second around the          knowledge graphs do not have the complete semantic repre-
world. A significant portion of this data is in the form of tex-     sentation of the real world, and can be improved with differ-
tual content.The need for understanding this vast amount of          ent relations between entities.
textual content keeps increasing as everything in the world              This paper presents two approaches to improve Topic
becomes more data-driven but mostly because of the fact              Modeling. The first approach applies various Multi-
that it’s impossible for us to do it manually.                       relational Network Embedding Methods by computing
   The fields of Natural Language Processing and Machine             the vectors on the same network, and incorporating the re-
Learning offer automated to understand large amounts of              sults into the topic modeling framework that has been taken
textual data. Vector representations of words (Mikolov et al.        as the base method of this work. The mentioned embed-
2013) (Řehůřek and Sojka 2010) (Pennington, Socher, and           ding methods all follow a translation-based approach to vec-
Manning 2014) (Joulin et al. 2016) have been used for many           tors with incremental improvements over the original work
Natural Language Processing tasks such as syntactic pars-            which is TransE (Bordes et al. 2013). Since knowledge em-
ing (Socher et al. 2013a) and sentiment analysis (Socher et          beddings are increasingly used for topic modeling, there is
Copyright held by the author(s). In A. Martin, K. Hinkelmann, A.     lack of a comprehensive study that discovers the effects of
Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of   knowledge encoded by various methods. Therefore, the pri-
the AAAI 2019 Spring Symposium on Combining Machine Learn-           mary motive of this work is to push the state of the art in this
ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford            field forward by the application of more advanced methods
University, Palo Alto, California, USA, March 25-27, 2019.           and knowledge bases for obtaining better knowledge graph
embeddings in order to improve topic modeling.
   The second approach modifies the network of the
knowledge graph itself, and manages to significantly in-
crease the density of the network by adding syntactic depen-
dency relations between words in a sentence that are com-
puted from the same text corpus used for the topic model-
ing. This combination is performed by computing the depen-
dency trees of the sentences in the text corpus, and adding
each relation to the knowledge graph between the corre-
sponding entities, thus updating and enlarging the network.
It studies the knowledge encoded by this denser network in
terms of relations between entities, and how it affects the
overall performance of embeddings, and consecutively topic
modeling.
   The paper is organized as follows: Section 2 presents the
related works. Section 3 contains the details of the employed
methods that are used in this paper. Section 4 describes the
source codes and implementations of the used methods. Sec-
tion 5 presents the results of the experiments and discusses
these outcomes. Section 6 concludes and draws some possi-
ble future work.
                                                                 Figure 1: The representation of both KGE-LDA(a) and
                     Related Work                                KGE-LDA(b) models (Yao et al. 2017)
In this section, the existing works in the literature that are
discussed constitute the basis for the main focus and direc-        LF-LDA, which stands for Latent Feature LDA, aims to
tion of this paper. KGE-LDA (Yao et al. 2017) is directly the    improve topic modeling by incorporating latent feature vec-
baseline work about topic modeling with knowledge graph          tors with a similar point of view as KGE-LDA. The differ-
embeddings that this paper is focused on. On the other hand,     ence is that, apart from being published before KGE-LDA,
LF-LDA (Nguyen et al. 2015) is an older method that in-          this paper obtains the latent feature representations directly
troduced the idea of using embeddings of words to improve        from the text corpus itself. It uses the famous word2vec
topic modeling. The discussion of these methods is aimed         (Mikolov et al. 2013) method to compute the embeddings
towards creating a general perspective for the main idea and     on a large text corpus, to be used later on a smaller corpus
experiments that are proposed in this paper.                     for topic modeling. Its main contribution that is relevant to
   KGE-LDA (Yao et al. 2017) is a knowledge-based topic          this paper consists of using a large external data to compute
model that combines the well-known LDA model with entity         the word embeddings. LF-LDA extends two topic models,
embeddings obtained from knowledge graphs. It proposes           LDA (Blei, Ng, and Jordan 2003) and DMM (Nigam et al.
two topic models that incorporate the vector representa-         2000) by adding a latent feature component to the Dirichlet
tions of words, by obtaining them from the knowledge bases       multinomial component that generates the words from top-
such as WordNet (Miller 1995) and Freebase (Bollacker et         ics in each topic model (Nguyen et al. 2015). The extended
al. 2008). The two topic models are based on the previous        methods are called LF-LDA and LF-DMM. The graphical
works CI-LDA (Newman, Chemudugunta, and Smyth 2006)              representation of LF-LDA can be seen in the Figure 2.
and Corr-LDA (Blei and Jordan 2003). The contributions of
this paper create the foundations that this paper studies and
attempts to improve. In this paper, the topic models of KGE-
LDA are used. Their claim and results show that knowledge
encoded from the knowledge graphs capture the semantics
better than the compared methods. In order to handle the em-
beddings, they propose a Gibbs Sampling inference method.
   KGE-LDA extends two entity topic models, namely CI-
LDA (Newman, Chemudugunta, and Smyth 2006) and Corr-             Figure 2: Representation of the LF-LDA model (Nguyen et
LDA (Blei and Jordan 2003) in order to incorporate the           al. 2015)
learned entity embeddings into the topic model. The model
based on CI-LDA is referred to as KGE-LDA(a) and the             Improving Knowledge Graph Embeddings for
model based on Corr-LDA is referred to as KGE-LDA(b)
in the paper and also throughout this work. The details re-                   Topic Modeling
garding these approaches are discussed in the following sub-     The focus of this paper is to explore the improvements
sections. The graphical representation of the models can be      in knowledge graph embeddings and their effects in topic
seen in Figure 1.                                                modeling performance. There are three explored dimensions
in the knowledge graph embedding process that are pre-               by addition as explained in previous subsections (head +
sumed to have direct effect on performance. These dimen-             relation = tail); DistMult composes vectors by weighted
sions are embedding method performance, the information              element-wise dot product, in other words the following mul-
in the knowledge base, and the vector dimension of the em-           tiplicative operation: head x relation = tail (Yang et al.
beddings. This chapter describes these dimensions and how            2014).
to explore them.
                                                                     TransR TransR model attempts to tackle the problem that
Embedding Methods Application                                        the same semantic space to model embeddings for all entities
                                                                     and relations is insufficient (Lin et al. 2015b). Building on
The following models are chosen for running the experi-              TransE and TransR; it build entity and relation embeddings
ments. TransE (Bordes et al. 2013) is the model used by the          in seperate semantic spaces.
authors of KGE-LDA (Yao et al. 2017), whereas the follow-
ing models of the respective papers are chosen because they          PTransE PTransE builds upon the previous methods by
are either directly or indirectly are compared with TransE           utilizing multiple-step relation paths in the knowledge
and each other, which provides us with a better understand-          graph. Their approach is similar to TransE, with the addi-
ing of the difference in their performance.                          tion of relation path-based learning (Lin et al. 2015a). In a
   The mentioned papers improve the state-of-the-art in              few simple words, they join consecutive relations in the path
knowledge graph embedding in their respective papers. The            into a single relation such as relation1 ◦ relation2 = rela-
presumption is that the models which improve upon the re-            tion path and use these paths in the model.
sults of TransE on other grounds such as Link Prediction,
should also deliver similar improvements in Topic Modeling           HolE Short for “Holographic Embeddings”, the differ-
results. To create a comparison of equal grounds, all of these       ence that this model adopts is the learning of the com-
models should be trained with the same dataset, the same             positional vector space representation of entire knowledge
parameters, and produce an output of the same embedding              graphs (Nickel et al. 2016). It uses correlation as the com-
dimension. By keeping all other variables same, it’s possi-          positional operator. The results of HolE are compared to
ble to directly observe the quality of the embeddings for the        TransE, TransR and other embedding methods in the pub-
purpose of topic modeling. The result of this approach helps         lished paper.
determine the methods and the configurations which moves                One interesting fact is that, HolE was proved to be equiv-
the state of the art further by producing the highest accuracy       alent to another method called ComplEx (Trouillon et al.
in topic modeling.                                                   2016), which was also published the same year (Hayashi and
   The following subsections explain the main characteris-           Shimbo 2017). Because of this fact, ComplEx was excluded
tics and differences of the compared embedding methods:              in this work from the experimentation.

TransE TransE model represents the relations in the graph            Analogy Analogy proposes the optimization of latent fea-
as translations in the embedding space (Bordes et al. 2013).         ture representations with respect to the analogical properties
For example, in a triple (head, relation, tail); the vector arith-   of the embeddings of both entities and relations (Liu, Wu,
metic equation head + relation = tail should hold true. In           and Yang 2017). It also unifies several methods in multi-
this model, a null vector would represent the equivalence            relational embedding which are DistMult (Yang et al. 2014),
of the head and tail entity. This also means that if the se-         ComplEx (Trouillon et al. 2016) and HolE (Nickel et al.
mantics of the graph are captured correctly, the result of the       2016). It’s also compared to all previous methods mentioned
vector arithmetic vector(”France”) - vector(”Paris”) + vec-          in this paper in the experiments of the published paper.
tor(”Rome”) should create a vector that is closest to the vec-          In Table 1, the time and space complexities along with the
tor(”Italy”) in the knowledge graph (Mikolov et al. 2013),           scoring functions of the described methods are compared.
with the assumption that the triples (Paris, capitalof, France)
and (Rome, capitalof, Italy) or similar semantic relations ex-       Knowledge Graph Extension with Dependency
ist.                                                                 Trees
   As stated before, TransE is part of the baseline method           While the previous two sections observe the effects of the
KGE-LDA that the following methods are compared to in                embedding models and process; this section focuses on the
the experiments.                                                     density and quality of the knowledge graphs, which the em-
TransH TransH model, models relations as hyperplanes in              bedding models are trained with.
addition to the translation operations as TransE does (Wang             Therefore, as a source of new information for the knowl-
et al. 2014). The motive is the fact that there are proper-          edge graph, the text corpus itself is a great answer. The de-
ties like reflexive, one-to-many, many-to-one and many-to-           pendency relations in sentences constitute meaningful se-
many; and there is a need to represent these mapping prop-           mantics, and a quite massive source of information. The
erties. Their claim is that TransE was not successful in pre-        question that remains to be answered is the fact that are
serving these properties.                                            semantic relations in a knowledge graph and a dependency
                                                                     graph are compatible with each other? Are they able to cre-
DistMult This model also directly aims to improve on                 ate a richer knowledge base? Are the current embedding
TransE model, and the main difference is the composition of          methods able to capture the information encoded in the re-
vectors. Different from TransE where vectors are composed            sulting massive graph?
Table 1: Characteristics of the different Embedding Methods. Parameters: d: Embedding size, ne : Number of entities, nr :
Number of relations, h: head entity, r: relation, t: tail entity, wr : vector representation of r, p: path
                                                       Time Complexity                              Space Complexity                       Scoring Function
                                  TransE                        O(d)                                O(ne d + nr d)                        −kh + r − tk1/2
                                  TransH                        O(d)                               O(ne d + 2nr d)              −k(h − wr> hwr ) + r − (t − wr> twr )k22
                                  DistMult                      O(d)                                O(ne d + nr d)                          h> diag(r)t
                                  PTransE                       O(d)                                O(ne d + nr d)                         −kp − (t − h)k
                                  TransR                       O(d2 )                            O(ne d + nr d + nr d2 )              −k(Mr h) + r − (Mr t)k22
                                   HolE                       O(d log d)                            O(ne d + nr d)                           r> (h ? t)
                                  Analogy                       O(d)                                O(ne d + nr d)                            h> M r t


   To answer these questions, the Knowledge Graph used in                                                             sions offer more space to encode the semantics, but naturally
this paper (WN18) was merged with the Dependency Graph                                                                it comes with performance costs.
obtained by the 20NG text corpus which is also used in this                                                              Furthermore, the effects of topic number chosen for the
paper for topic modeling. As the details can be seen in the                                                           topic model also has a direct effect on the performance. Con-
Datasets subsection of the Experiments section; the density                                                           sidering the results in KGE-LDA (Yao et al. 2017), where
of the graph increased about 5 times, which surely created a                                                          the accuracy increases with topic number, a significantly in-
more complex semantic structure.                                                                                      creased topic number and its impact should be observed.
   The general structure of the merging phase is illustrated                                                             Lastly, the extended knowledge graph method that was
in Figure 3. The process finds the dependency trees of each                                                           described in the previous section should also be examined
sentences. Then, the corresponding entity of each word in                                                             with the increased parameters as the information encoded
the knowledge graph is found. If the words and computed                                                               from a larger graph might even provide greater performance
dependencies pass the filtering stage, a new link is added                                                            with higher dimensional embeddings and higher topic num-
between the corresponding entities in the knowledge graph                                                             bers.
with the name of the dependency relation.
                                                                                                                                               Implementation
                                           Dependency Tree of a Sentence                                              Base Topic Modeling Framework
                                                     word1
                                                             dependency_relation
                                                                                                                      To merge learned embeddings with the process of the topic
                                          word3                word2
                                                                                                                      modeling, the original implementation of KGE-LDA by its
                                                                                                                      authors was used1 . The original implementation was cho-
                                                                       word4
                                                                                                                      sen, because KGE-LDA is the baseline work that this paper
                                                                                                                      follows; thus it’s the best choice for running the experimen-
                                                                                         Extended
           Knowledge Graph                                                            Knowledge Graph                 tations.
                                                                                    dependency_relation
                                                                                                                         The source code is structured as a Java project, and has a
                                                                                                                      dependency for the Stanford CoreNLP library. Along with
Synset("word1")         Synset("word2")                                    Synset("word1")          Synset("word2")   KGE-LDA, the project contains the implementations for
    relation1               relation2                                          relation1                relation2     LDA (Blei, Ng, and Jordan 2003) and CTM (Blei and Laf-
                Synset("word3")                                                            Synset("word3")
                                                                                                                      ferty 2006). Several alterations and additions were made
                                                                                                                      in the implementation for the third part of the experi-
                                                                                                                      ments(Knowledge Graph Extension). The additions are as
                                                                                                                      follows:
                                                                                                                      • Parsing 20NG dataset with the CoreNLP Dependency-
Figure 3: Visualization of the Knowledge Graph Extension
                                                                                                                         Parser and to obtain dependency trees.
                                                                                                                      • Updating the WN18 graph with the obtained dependen-
Further Exploration of Parameters                                                                                        cies.
This section aims to increase the primary parameters to mea-                                                          • Various minor alterations throughout the source code.
sure their effects on the final outcome. The motive is that
as long as computational limits and feasibility allow, better                                                         Embedding Methods
parameters and settings should be used if it provides consid-                                                         For the purpose of the experimentations for Embedding
erable improvements in the performance. In the light of this                                                          Method Comparison, the implementations of the chosen em-
motive, the following aspects are considered.                                                                         bedding methods were needed. Therefore, implementations
   The first aspect to be investigated is the effects of the em-                                                      of TransE, TransH, TransT and PTransE were taken from the
bedding dimension on the Topic modeling performance. The                                                              open-source project KB2E2 . The implementations of Dist-
motive for this aspect is the fact that the larger and denser the
                                                                                                                           1
knowledge graph or the dataset gets, it creates more infor-                                                                    https://github.com/yao8839836/KGE-LDA
                                                                                                                           2
mation to be stored in the embeddings. Larger vector dimen-                                                                    https://github.com/thunlp/KB2E/
Mult, HolE and Analogy were taken from the open-source                                                                     used datasets along with the chosen parameters are stated for
project OpenKE3 .                                                                                                          each of the different experiment sets.
                                                                                                                             The experiments are conducted to find answers to follow-
Dependency Parser                                                                                                          ing questions:
For the purpose of the Knowledge Graph Extension part of                                                                   1. Are newer and improved embedding models able to cap-
this paper, Stanford CoreNLP DependencyParser Annotator                                                                       ture better semantics for the purpose of topic modeling?
was used. Using DependencyParser, the code for the Knowl-
edge Graph Extension part was implemented in Java. The                                                                     2. How does the number of topics affect the performance of
process and the implementation follows this algorithm:                                                                        these sets of methods?
                                                                                                                           3. Does a denser and more complex knowledge base create
 Algorithm 1: Knowledge Graph Extension with Depen-                                                                           a better or worse encoding of entities?
 dency Trees                                                                                                               4. What is the importance of the vector dimensions in cap-
1  KnowledgeGraph ←− WN18;                                                                                                    turing and encoding information? Do we need larger
 2 DependencyNetwork ←− Empty Graph;                                                                                          vectors for more accurate representations for the used
 3 for Document d in 20NG do                                                                                                  datasets?
 4     for Sentence s in d do
 5         t ←− DepencencyParser(s);                                                                                         The experiments are grouped into three categories that
 6         DependencyNetwork append t;                                                                                     each try to answer the corresponding questions stated above.
 7     end                                                                                                                 We proceed with three sets of experiments: (1) Embedding
 8 end                                                                                                                     Method Application and Comparison; (2) Knowledge Graph
 9 KnowledgeGraph merge DependencyNetwork;                                                                                 Extension; (3) Further Exploration of Parameters.
10 return KnowledgeGraph;

                                                                                                                           Baselines
  To visualize how the dependency relations are merged                                                                     Two topic models are chosen to compare the results of ex-
with the knowledge graph, please refer to the Figure 4.                                                                    periments with:
                                              "Furthermore, sales of satellite
                                                                                                                           • LDA (Blei, Ng, and Jordan 2003)
                                             ground equipment should go up in
                                               the next revision of this data."
                                                                                                                           • KGE-LDA (Yao et al. 2017)
                                                    Dependency Parser                                                         LDA was chosen as the primary indicator of performance,
                                                                                                                           because it’s the most widely used topic model which is con-
                                                                                                                           sidered as the baseline method for many other works in the
             Knowledge Graph
                                           Equipment
                                                        Compound
                                                                         Satellite                                         field. KGE-LDA was chosen as the main indicator of perfor-
                                                                                         Extended Knowledge Graph
                                                                                                                           mance since it is the baseline method and starting point of
                                                                                                                           this paper work.
                                                                                                 Compound


 Equipment
               Hyponym
                               Satellite                                             Equipment                 Satellite
                                                                                                                           Datasets
                                                                                                 Hyponym
                                                                                                                           Text Corpus The datasets in the context of this work refer
                                                                                                                           to the text corpus that is used to run the topic models. For
                                                                                                                           this purpose, 20-Newsgroups (20NG) dataset was used. The
                                                                                                                           dataset includes 18,846 documents, split into 20 categories,
                                                                                                                           with a vocabulary of 20,881 distinct words. In the text pre-
Figure 4: An Example of Merging a Dependency Relation
                                                                                                                           processing phase, the following steps are applied to the data:
from a Sentence with the Knowledge Graph
                                                                                                                           Tokenization (with Stanford CoreNLP), stopwords removal,
                                                                                                                           and rare words removal (for words that appear less than 10
   The example in the Figure 4 shows how a dependency                                                                      times throughout the dataset).
relation extracted from a sentence updates the knowledge
graph. In this specific example, there is a “Hyponym” re-                                                                  External Knowledge The external knowledge refers to
lation from “Equipment” entity to “Satellite” entity in the                                                                the knowledge graph that was used to train the represen-
knowledge graph. The dependency parser finds out that                                                                      tation learning methods to obtain the word embeddings.
these two words are used in a compound in the correspond-                                                                  WN18, which is a subset of a widely used lexical knowledge
ing sentence, and updates the knowledge graph with the                                                                     graph WordNet, was used for this purpose. WN18 has the
“Compound” relation.                                                                                                       following characteristics in the training set: 141,442 triplets
                                                                                                                           (the missing 10,000 triplets of WN18 are in the test and val-
                                           Experiments                                                                     idation sets); 40,943 entities; 18 types of relations; 8,819
In this section, a series of experiments that involve differ-                                                              common entities with the 20NG vocabulary.
ent methods and variations of parameters are presented. The                                                                   Table 2 shows the top 10 occurring relation types in the
                                                                                                                           knowledge graph, their occurring counts, and percentages in
     3
         https://github.com/thunlp/OpenKE                                                                                  size over the whole graph.
Table 2: Occurence Counts and Percentages of Top 10 Rela-           Table 4: Embedding Methods Comparison Settings
tions in the Original WN18 Dataset
                                                                        Parameter Name               Parameter Value
           Relation            Count   Percentage of Graph              Embedding Dimension                  50
          Hyponym              34832          24.6%                     Gibbs Sampling Iterations          1000
          Hypernym             34796          24.6%                     Learning Rate                      0.001
 Derivationally Related Form   29715          21.0%                     Hyperparameter α              50/K (#Topics)
     Member Meronym             7402          5.23%                     Hyperparameter β                    0.01
      Member Holonym            7382          5.22%                     Number of Topics (K)           20, 30, 40, 50
           Has part             4816          3.40%
           Part of              4805          3.40%
  Member of Domain Topic        3118          2.20%
  Synset Domain Topic of        3116          2.20%                  Table 5: Further Exploration Experiment Settings
     Instance Hyponym           2935          2.08%
                                                                         Parameter Name             Parameter Value
                                                                         Embedding Dimension              100
Extended Knowledge Graph As mentioned before, the                        Topic Number                   50, 100
Knowledge Graph in the previous subsection was merged
with the dependency graph obtained from the 20NG text
corpus. The resulting graph has the following characteris-
tics that have increased relative to the original knowledge
                                                               1. Settings for Embedding Methods Comparison: all param-
graph(WN18):
                                                                  eters have been fixed, except for the number of topics (and
• 817,568 triplets, with respect to the original 141,442;         the respective parameter α), as reported in Table 4;
• 55 types of relations, increased from the original 18.
   There were new relations introduced to the knowledge        2. Settings for Knowledge Graph Extension: The settings are
graph, but no new entities. To demonstrate how the knowl-         the same as the settings of Embedding Methods Compar-
edge graph changed, here are the top 10 occurring relation        ison group.
types, their occurring counts, and percentages in size over
the whole graph:                                               3. Settings for Further Exploration of Parameters: with the
                                                                  aim of delving into detailed investigation of the param-
                                                                  eter values, a further set of experiment with new varia-
Table 3: Occurence Counts and Percentages of Top 10 Rela-         tions of the settings have been launched, with values as
tions in the Extended WN18 Dataset                                reported in Table 5. With respect to the initial experiments
           Relation        Count   Percentage of Graph            (parametrized as in point 1 of this list), the embedding di-
                                                                  mension is increased to 100 and the number of topics is
            Root           30117          15.0%                   increased to 100.
     Nominal Modifier      90531          11.1%
        Compound           78654          9.62%
       Direct Object       56423          6.90%
     Adjectival Modifier   53819          6.58%                Results
         Dependent         35930          4.39%
         Hyponym           34832          4.26%
         Hypernym          34796          4.26%                The results are obtained through two different evaluation
         Conjunct          33223          4.06%                mechanisms, namely Topic Coherence and Document Clas-
         Auxiliary         30775          3.76%                sification. UCI method which uses Pointwise mutual infor-
                                                               mation (Newman et al. 2010) was used for Topic Model-
                                                               ing, and LIBLINEAR linear classification library (Fan et al.
   It can be seen that the structure of the knowledge graph    2008) was used for Document Classification. In the rest of
has changed substantially, with the high number of addi-       the subsection, these results will be presented and discussed.
tions. With the extension, the size of the graph grew by
578% compared to the original knowledge graph, and 37
new relation types were added.                                 Embedding Methods Comparison
Settings
A set of settings of the different parameters have been de-    Topic Coherence Results As stated before PMI based
fined for the execution and validation of the approach. Some   topic coherence was used to obtain these results. To compute
parameters have been adopted with a constant value across      PMI, a dataset of 4,776,093 Wikipedia articles were used.
the experiments, while others have been varying across ex-     For each method and topic, the results were run 5 times, after
periments. The settings considered include:                    which the average and the standard deviation was calculated.
                                                               The results can be found in Table 6.
 77                                                              77                                                               77                                                          77


 75                                                              75                                                               75                                                          75

 73                                                              73                                                               73                                                          73

                                       72.3                      71                                                                                                                           71
 71                                                                                                                               71
                              70.6                                       70.6                                     70.3   70.3                                                                         70.3                                           70.4
                                                                 69              70.1   69.8                                             70.2                                                 69                     69.9      69.8
 69                                                                                                                               69                                                                         69.5                      69.5   69.6
                                              69.2   69                                            69
        68.8          68.7                                                                                                                      68.8            68.8           68.8   68.7
 67                                                              67                                         68                    67                    67.8                                  67
               67.5
                                                                                                                                                                        66.7
 65                                                              65                                                               65                                                          65

 63                                                              63                                                               63                                                          63
                             K = 20                                                               K = 30                                                       K = 20                                                         K = 30

      TransE      TransH        DistMult       PTransE RNN            TransE        TransH           DistMult      PTransE RNN         TransE      TransH         DistMult      PTransE RNN        TransE       TransH           DistMult      PTransE RNN
      TransR      HolE          Analogy                               TransR        HolE             Analogy                           TransR      HolE           Analogy                          TransR       HolE             Analogy


                      20 Topics                                                         30 Topics                                                      20 Topics                                                    30 Topics
 77                                                              77                                                               77                                                          77

 75                                                              75                                                               75                                                          75

 73                                                              73                                                               73                                                          73
                                                                                                                                                                                                                                                     72.6
 71                                                              71                                                               71                                                          71
                                                                         71.4              71.3
                      70.7                                                                                               71                                                                           71                               70.9
                                              70.4                               70.3              70.6    70.6                                 70.5                           70.6
 69                                                                                                               70.2                                          70.3                                         70.4      70.2                   70.4
        69.6   69.7           69.6     69.5                      69                                                               69      70              70                                  69                               69.8
                                                     69.4                                                                                                                             69.4
 67                                                                                                                                                                     68.4
                                                                 67                                                               67                                                          67

 65                                                              65                                                               65                                                          65

 63                                                              63                                                               63                                                          63
                             K = 40                                                               K = 50                                                       K = 40                                                         K = 50
      TransE      TransH        DistMult       PTransE RNN            TransE        TransH           DistMult      PTransE RNN         TransE      TransH         DistMult      PTransE RNN        TransE       TransH           DistMult      PTransE RNN
      TransR      HolE          Analogy                               TransR        HolE             Analogy                           TransR      HolE           Analogy                          TransR       HolE             Analogy

                      40 Topics                                                         50 Topics                                                      40 Topics                                                    50 Topics

Figure 5: Topic Coherence Scores of Topic Modeling Ob-                                                                           Figure 6: Topic Coherence Scores of Topic Modeling Ob-
tained Through Different Embedding Methods with the In-                                                                          tained Through Different Embedding Methods with the In-
corporation Model A, Separated by Topic Number                                                                                   corporation Model B, Separated by Topic Number


Table 6: Topic Coherence Results of Embedding Methods                                                                            see TransR performing better than other methods. It is also
on Topic Modeling. The best results are reported in bold.                                                                        worth mentioning that TransR performs best with the topic
                                                                                                                                 number of 20 than higher numbers, and performs worst on
        Model                         K = 20                 K = 30              K = 40                     K = 50
                                                                                                                                 30 topics. With 20 topics, the standard deviation seems to be
      LDA                      68.4 ± 2.63                  72.5±1.87           70.9±1.74                  71.6±0.45             higher than higher topic numbers with the best(TransR) and
    TransE(a)                   68.8±3.56                   70.6±2.08           69.6 ±1.13                 71.4±1.82             worst(TransH) scores of all the combinations.
    TransE(b)                   70.2±1.79                   70.3±0.52            70 ±1.56                   71±1.41
    TransH(a)                   67.5±2.4                     70.1±1.4           69.7 ±0.99                 70.3±1.37
    TransH(b)                  68.8±1.25                    69.5±1.51            70.5 ±0.8                 70.4±0.35
   DistMult(a)                 68.7±1.97                    69.8±0.95            70.7±1.34                 71.3±1.62             Model B With Model B, there is also a general trend of im-
   DistMult(b)                 67.8±2.11                    69.9±2.25            70 ±0.79                   70.2±1.8             provement with topic number. The standard deviation in the
 PTransE RNN(a)                  70.6 3.1                    69±1.66            69.6 ±1.81                 70.6±1.93
                                                                                                                                 general trend also gets smaller with increasing topic number.
 PTransE RNN(b)                68.8±3.21                     69.8±1.8           70.3 ±1.92                 69.8±0.96
                                                                                                                                 TransR scores the lowest on 20 topics, even though it scored
    TransR(a)                  72.3 ±2.41                    68±1.27            69.5 ±0.95                 70.6±1.79
    TransR(b)                   66.7±2.09                   69.5±1.84           68.4 ±1.59                 70.9±0.54
                                                                                                                                 the highest on 20 topics with Model A. The highest score
     HolE(a)                    69.2±3.51                   70.3±1.78           70.4 ±1.21                 70.2±2.14             combination is Analogy method with 50 topics.
     HolE(b)                    68.8±1.23                   69.6±2.33            70.6 ±2.2                 70.4±1.78
   Analogy(a)                    69±1.88                     70.3±2.5            69.4 ±2.1                  71±0.52              Document Classification Results The documents have
   Analogy(b)                  68.7±2.25                    70.4±1.24           69.4 ±2.44                 72.6±1.41             been classified using LIBLINEAR (Fan et al. 2008). For
                                                                                                                                 each method and topic, the results were run 5 times. The
                                                                                                                                 average and the standard deviation are reported in the Table
Overall Topic Coherence Results The best and second                                                                              7 for each method and topic number.
coherence scores for each topic number are different, and it
should be noted that the performance of the original LDA
is consistently good. TransR leads to more coherent topics
with lower topic numbers, and Analogy performs best with                                                                         Overall Document Classification Results The Table 7
higher topic numbers. The general trend shows improvement                                                                        show that in overall results with topic numbers 20,30,40 and
with higher topic numbers.                                                                                                       50; HolE and Analogy perform the best. Also on average,
                                                                                                                                 Model A results in slightly better scores than Model B; even
                                                                                                                                 though Analogy performs better with Model B. Another ob-
Model A on Topic Coherence For 30,40 and 50 topics                                                                               servation is that, performance almost always increases with
the topic coherence results are close and in the same range                                                                      topic number; noting that with 40 and 50 topics, the results
with each other. The only significant visual difference in co-                                                                   are closer to each other than with other topic number incre-
herence can be observed with topic number 20 where we                                                                            ments.
                                                                                                                                                   tency in their results.
Table 7: Classification Results of Embedding Methods on
Topic Modeling. The best results are reported in bold.
 Model                                   K = 20                     K = 30                    K = 40                       K = 50                   0.750                                                                 0.750

                                                                                                                                                    0.725                                                                 0.725
 LDA                                  0.539± 0.028              0.633±0.022              0.695 ±0.022                  0.69±0.022                   0.700                                                                 0.700

 TransE(a)                             0.57± 0.024              0.677±0.013              0.705 ±0.011                 0.694±0.017                   0.675                                                                 0.675                                                         0.687
                                                                                                                                                                                                                                      0.670   0.666    0.667                    0.668
 TransE(b)                            0.554±0.017               0.670±0.017              0.676 ±0.022                 0.714±0.006                   0.650                                                                 0.650                                0.659    0.662

                                                                                                                                                    0.625                                                                 0.625
 TransH(a)                            0.567± 0.032              0.668± 0.027              0.71 ±0.019                 0.714±0.009
                                                                                                                                                    0.600                                                                 0.600
 TransH(b)                            0.555±0.014               0.666±0.035              0.694 ±0.013                 0.697±0.024                                                                                         0.575
                                                                                                                                                    0.575                        0.587
 DistMult(a)                           0.59±0.021               0.644±0.015               0.706±0.019                 0.702±0.026                   0.550
                                                                                                                                                                                         0.576
                                                                                                                                                                                                          0.563           0.550
                                                                                                                                                                0.554   0.555                     0.555           0.554
 DistMult(b)                          0.587±0.017               0.667±0.014              0.687 ±0.014                 0.694±0.025                   0.525                                                                 0.525

 PTransE RNN(a)                       0.567±0.024               0.667±0.024              0.701 ±0.012                 0.709±0.010                   0.500                                                                 0.500
                                                                                                                                                                                         K = 20                                                                K = 30
 PTransE RNN(b)                       0.576±0.016               0.659±0.015              0.684 ±0.024                 0.701±0.021                           TransE        TransH           DistMult        PTransE RNN            TransE        TransH           DistMult        PTransE RNN
 TransR(a)                            0.574±0.012               0.656±0.018              0.687 ± 0.022                0.716±0.011                           TransR        HolE             Analogy                                TransR        HolE             Analogy

 TransR(b)                            0.555±0.035               0.662±0.022              0.692 ±0.005                 0.695±0.026                                           20 Topics                                                             30 Topics
 HolE(a)                              0.597±0.032                0.679±0.032             0.697 ±0.021                 0.707±0.004
                                                                                                                                                    0.750                                                                 0.750
 HolE(b)                              0.563±0.022               0.668±0.034              0.684 ±0.026                 0.713±0.017
                                                                                                                                                    0.725                                                                 0.725
 Analogy(a)                           0.579±0.014               0.641±0.037              0.704 ±0.022                 0.715±0.009                   0.700                                                                 0.700       0.714                                     0.713
                                                                                                                                                                                                                                                                                        0.719
                                                                                                                                                                                                                                                               0.701
 Analogy(b)                           0.554±0.004               0.687±0.017              0.676 ±0.022                 0.719±0.006                   0.675               0.694
                                                                                                                                                                                 0.687   0.684
                                                                                                                                                                                                  0.692
                                                                                                                                                                                                          0.684           0.675
                                                                                                                                                                                                                                              0.697    0.694            0.695
                                                                                                                                                                0.676                                             0.676
                                                                                                                                                    0.650                                                                 0.650

                                                                                                                                                    0.625                                                                 0.625

                                                                                                                                                    0.600                                                                 0.600

 0.750                                                                       0.750                                                                  0.575                                                                 0.575

 0.725                                                                       0.725                                                                  0.550                                                                 0.550

 0.700                                                                       0.700                                                                  0.525                                                                 0.525

 0.675                                                                       0.675                                                                  0.500                                                                 0.500
                                                                                         0.677                                     0.679
                                                                                                 0.668            0.667                                                                  K = 40                                                                K = 50
 0.650                                                                       0.650
                                                                                                                           0.656
                                                                                                          0.644                            0.641            TransE        TransH           DistMult        PTransE RNN            TransE        TransH           DistMult        PTransE RNN
 0.625                                                                       0.625
                                                                                                                                                            TransR        HolE             Analogy                                TransR        HolE             Analogy
 0.600                                                                       0.600

 0.575                        0.590
                                                0.574
                                                        0.597
                                                                0.579
                                                                             0.575                                                                                          40 Topics                                                             50 Topics
             0.570   0.567             0.567                                 0.550
 0.550

 0.525                                                                       0.525

 0.500                                                                       0.500
                                       K = 20                                                                     K = 30

         TransE        TransH            DistMult        PTransE RNN                 TransE        TransH           DistMult        PTransE RNN
                                                                                                                                                   Figure 8: Document Classification Accuracy of Topic Mod-
         TransR        HolE              Analogy                                     TransR        HolE             Analogy
                                                                                                                                                   eling Obtained Through Different Embedding Methods with
                         20 Topics                                                                   30 Topics                                     the Incorporation Model B, Separated by Topic Number
 0.750                                                                       0.750

 0.725                                                                       0.725

 0.700               0.710                                                   0.700               0.714                     0.716           0.715
             0.705            0.706                             0.704                                             0.709            0.707
                                       0.701            0.697                                             0.702
 0.675                                                                       0.675       0.694
                                                0.687
 0.650                                                                       0.650

 0.625                                                                       0.625

 0.600                                                                       0.600

 0.575                                                                       0.575
                                                                                                                                                   Model B The main difference of Model B generates the
 0.550                                                                       0.550                                                                 entity embeddings by topics in the same document, so it’s
 0.525                                                                       0.525

 0.500                                                                       0.500
                                                                                                                                                   important to state that the embeddings of the best methods
                                       K = 40                                                                     K = 50

         TransE        TransH            DistMult        PTransE RNN                 TransE        TransH           DistMult        PTransE RNN
                                                                                                                                                   are a better fit for this approach.
         TransR        HolE              Analogy                                     TransR        HolE             Analogy


                         40 Topics                                                                   50 Topics                                        The results of Model B reveal that in 20 topics, DistMult
                                                                                                                                                   is the best performer along with PTransE RNN. In 30 top-
                                                                                                                                                   ics, Analogy outperforms others, as all the other methods
Figure 7: Document Classification Accuracy of Topic Mod-                                                                                           score similar to each other. In 40 topics, TransH and TransR
eling Obtained Through Different Embedding Methods with                                                                                            score better than others by a landslide. In 50 topics, Analogy
the Incorporation Model A, Separated by Topic Number                                                                                               seems to outperform others with TransE and HolE scoring
                                                                                                                                                   close.
Model A The results of Model A show that in 20 topics,                                                                                                The outcomes show that Analogy and DistMult are the
HolE and DistMult perform the best. Their approach appar-                                                                                          best performers overall with Model B. It’s also important to
ently is better for small number of topics. Analogy also per-                                                                                      note that Analogy gives more consistent results with multi-
forms close to them. In 30 topics, the results show that HolE                                                                                      ple runs, which can be seen in lower standard deviation than
again scores best. However, this time DistMult scores low,                                                                                         other methods.
and TransE, TransH and TransR which employ an addition
based translation score better. In 40 topics, the performance
of all methods converge, with all of them scoring more sim-                                                                                        Knowledge Graph Extension
ilarly than they do in other topic numbers. 50 topic results
are also relatively similar, with TransR, Analogy and TransH
scoring best.                                                                                                                                      Topic Coherence Results The Topic Coherence experi-
   The outcomes show that HolE is the best performer over-                                                                                         ments were run according to the parameters specified be-
all with Model A. Looking at the standard deviations, it                                                                                           fore. Each experiment was run 5 times, with averages and
seems that with Model A; the methods have similar consis-                                                                                          standard deviations reported in the Table 8.
                                                                                Document Classification with Knowledge Graph Exten-
Table 8: Topic Coherence Results with Knowledge Graph                           sion Overview Results in Table 9 show that the knowledge
Extension. The best results are reported in bold.                               graph extension created better semantics in the graph which
                       K = 20           K = 30       K = 40       K = 50        in turn reflected to the classification results. We see an over-
  Orig. K.G. (a)      68.8±3.56    70.6±2.08       69.6±1.13     71.4±1.82      all improvement with both Model A and Model B, whereas
  Orig. K.G. (b)      70.2±1.79    70.3±0.52        70±1.56       71±1.41       improvements with Model A are larger. Extended Graph
  Ext. K.G. (a)       70.5±2.44    69.5±1.08       70.4±1.44     70.7±2.56      with Model A performs better with smaller topic numbers,
  Ext. K.G. (b)       68.1±0.48    70.1±2.13       71.5±3.01      71.3±0.7
                                                                                where as the extended graph with Model B is more accurate
                                                                                on larger topic numbers.

Topic Coherence with Knowledge Graph Extension                                  0.750
Overview The results in Table 8 show that the Extended
Knowledge Graph led to similar results with the Original
Knowledge Graph. With an overall inspection of the table,                       0.700
it can be seen that the best performance are distributed to
different models and graphs. The version with the Extended                      0.650
Knowledge Graph provided better average scores for 20 top-
ics and 40 topics. Also, the overall trend is similar to the
topic coherence results of the previous section, as 30, 40 and                  0.600
50 topics resulted in the same range of performance with
each other.
                                                                                0.550
77

75                                                                              0.500
                                                                                          K = 20           K = 30         K = 40          K = 50
73                                                                                                 Original KG (a)   Original KG (b)
                                                                                                   Extended KG (a)   Extended KG (b)
71

69
                                                                                Figure 10: Document Classification Accuracies of Topic
                                                                                Modeling Obtained Through Original Knowledge Graph
67                                                                              and Extended Knowledge Graph, on Different Topic Num-
                                                                                bers
65
                                                                                Increased Topic Number and Embedding Dimension
63                                                                              The experiment in this section corresponds to the previous
           K = 20             K = 30             K = 40          K = 50
                                                                                subsections about further exploration of parameters. For this
                       Original KG(a)       Original KG(b)                      purpose, an increased topic number of 100 and an increased
                       Extended KG (a)      Extended KG (b)                     embedding dimension of 100 was used with TransE and
                                                                                Analogy on the original knowledge graph, and furthermore
                                                                                TransE on Extended Knowledge Graph.
Figure 9: Topic Coherence Scores of Topic Modeling Ob-                             The average and standard deviations obtained from 5 runs
tained Through Original Knowledge Graph and Extended                            of each combinations are reported in Tables 10 and 11.
Knowledge Graph, On Different Topic Numbers

                                                                                Table 10: Topic Coherence Results with 100 Dimensional
Document Classification Results The experiments in this                         Embeddings. The best results are reported in bold.
section were also run 5 times as the ones before. The aver-
ages with the standard deviations are reported in Table 9.                                                             K = 50          K = 100
                                                                                    TransE on Orig. K.G. (a)          70.1±1.1         73.4±1.71
                                                                                    TransE on Orig. K.G. (b)         71.5±1.89         73.5±1.11
Table 9: Document Classification Results with Knowledge                             Analogy on Orig. K.G. (a)        69.4±0.82         72.7±1.73
Graph Extension. The best results are reported in bold.                             Analogy on Orig. K.G. (b)         70±0.81          75.1±2.21
                                                                                     TransE on Ext. K.G. (a)         70.1±1.44         72.7±0.8
                     K = 20            K = 30         K = 40         K = 50          TransE on Ext. K.G. (b)         71.5±0.19         73.4±0.84
Orig. K.G. (a)      0.57±0.024    0.677±0.013      0.705±0.011    0.694±0.017
Orig. K.G. (b)     0.554±0.017    0.670±0.017      0.676±0.022    0.714±0.006
                                                                                   According to the Topic Coherence scores, the extended
Ext. K.G. (a)      0.582±0.017    0.683±0.032      0.692±0.010    0.711±0.027
                                                                                knowledge graph provides a better performance on 50 topics
Ext. K.G. (b)      0.566±0.015    0.656±0.014      0.695±0.018    0.716±0.010
                                                                                than both TransE and Analogy on the original graph. Even
                                                                          0.75
though it scores the equal as the same configuration with
Original Knowledge Graph, its standard deviation is 90%
                                                                           0.7
lower. On 100 topics, Analogy with Model B stands out with
the highest coherence score that was obtained throughout
the experiments of this paper work by scoring 2.18% higher                0.65

than the closest coherence score. Figure 11 offers a clear
comparison of these results in a visual way.                               0.6


77                                                                        0.55

75
                                                                           0.5
73                                                                                          K = 50                              K = 100

71                                                                                  TransE on Original KG (a)     TransE on Original KG (b)
                                                                                    Analogy on Original KG (a)    Analogy on Original KG (b)
69
                                                                                    TransE on Extended KG (a)     TransE on Extended KG (b)
67
                                                                        Figure 12: Document Classification Accuracy of Topic
65
                                                                        Modeling Obtained Through Specified Method and Knowl-
63                                                                      edge Graph Combinations, with 100 Dimensional Embed-
                K = 50                                K = 100           dings on Different Topic Numbers
          TransE on Original KG (a)     TransE on Original KG (b)
          Analogy on Original KG (a)    Analogy on Original KG (b)
                                                                        Runtime Duration
          TransE on Extended KG (a)     TransE on Extended KG (b)
                                                                        The experiments were conducted on a computer with the fol-
                                                                        lowing relevant technical specifications:
Figure 11: Topic Coherence Scores of Topic Modeling Ob-
tained Through Specified Method and Knowledge Graph                     • Intel Core i5-8250U CPU @ 1.60GHz
Combinations, with 100 Dimensional Embeddings on Dif-                   • 8 GB of DDR4 RAM @ 1866 MHz
ferent Topic Numbers
                                                                           Throughout the experiments, the elapsed execution time
                                                                        was measured. Embedding methods were run only once to
                                                                        obtain the representations from the knowledge graph. The
Table 11: Document Classification Results with 100 Dimen-               fastest embedding happened to be TransE with approxi-
sional Embeddings. The best results are reported in bold.               mately 1 hour of computation, and the slowest was HolE
                                                                        with approximately 17 hours of computation. All of other
                                      K = 50                 K = 100    methods ran for a duration between 1 hour and 2 hours. It
 TransE on Orig. K.G. (a)              0.712±0.020        0.725±0.009   is safe to say that HolE was exceptionally slow during the
 TransE on Orig. K.G. (b)             0.705 ±0.009        0.724±0.006   training phase compared to other methods.
 Analogy on Orig. K.G. (a)            0.711 ±0.010         0.73±0.010      The more crucial and overall time consuming part was
 Analogy on Orig. K.G. (b)            0.706 ±0.010        0.727±0.010   running the topic models with the obtained representations.
  TransE on Ext. K.G. (a)             0.712 ±0.011       0.734±0.002
  TransE on Ext. K.G. (b)             0.693 ±0.019       0.726±0.013
                                                                        The duration of topic modeling phase was not affected by
                                                                        the representations obtained by different methods, as they
                                                                        all provide an output of the same size. However, the topic
   The extended knowledge graph scores the highest Docu-                number and embedding size had a significant effect on the
ment Classification accuracy for both 50 topics and 100 top-            execution time. The average durations are reported in two
ics with Model A. In fact, the Extended Graph with Model                separate tables. For embedding size of 50 the results can be
A on 100 topics scored the highest accuracy for Docu-                   seen in Table 12 and for embedding size of 100 the results
ment Classification throughout the experiments of this paper            can be seen in Table 13.
by scoring 1.24% higher than the same configuration with
the Original Knowledge Graph. On 50 topics, it scored the
same average with the Original Knowledge Graph but with                 Table 12: Average execution time of Topic Modeling with
a smaller standard deviation. According to these results, the           50-dimensional embeddings (in minutes) depending on the
Extended Knowledge Graph leads to better accuracy than                  number of topics K.
the Original Knowledge Graph with the exception of 50 top-                                   K = 20        K = 30        K = 40       K = 50
ics with Model B. It also performs better than Analogy with
                                                                                 Model A        133             142        164            189
Model A. These results can also be clearly seen in Figure
                                                                                 Model B        121             145        162            216
12.
                                                                     eral than 30 topics. The usage of different embedding meth-
Table 13: Average execution time of Topic Modeling with              ods create topic coherence results that are in 1.2% range
100-dimensional embeddings (in minutes) depending on the             of each other on average. Analogy with Model B leads to
number of topics K.                                                  the highest coherence scores with high topic numbers. The
                              K = 50         K = 100                 extended knowledge graph clearly improved the Document
                                                                     Classification accuracy with the exception of 40 topics. The
                   Model A      217            340
                   Model B      209            314
                                                                     improvements on Topic Coherence is on 20 and 40 topics.
                                                                        For a general purpose use, Analogy is a clear choice over
                                                                     DistMult and HolE. The first reason is the fact that Analogy
   To more clearly interpret the execution times, Figure 13          is a generalized method which can reproduce DistMult and
provides a visual representation. It can be seen in the fig-         HolE with a selection of parameters; it allows a higher range
ure on K = 50 that the runtime duration decreases by 3.3%            of performance and parameters. This should allow a grid
with the embedding size with Model B, and increases 14.5%            search to find a configuration which is better than DistMult
with Model A from 50 dimensional embeddings to 100 di-               and HolE. The second reason is the fact that even though
mensional embeddings.                                                HolE and Analogy with the same parameters perform quite
   However an increase from 50 topics to 100 topics in-              similar to each other, it takes much longer to train HolE (∼
creases runtime duration by 79.9% with Model A and 45.4%             17 hours) compared to Analogy(∼ 1-2 hours). A much faster
with Model B. Considering these facts with the general trend         training with theoretically being able to produce the same
of growth in the figure; it is safe to say that topic number has     results as HolE, makes Analogy more feasible.
a larger impact on runtime duration than the embedding size             The Document classification Evaluation produced results
during topic modeling.                                               that are clearer and easier to interpret in general. With small
                                                                     exceptions, increased topic number produced better results.
  400                                                                In the embedding method comparison section, it can be seen
  350
                                                                     that some of the newer and more complex embedding meth-
                                                                     ods like DistMult, HolE and Analogy led to higher classifi-
  300                                                                cation accuracy. Model A seems to be on average 1% better
                                                                     than Model B, but they produce equally consistent results
  250
                                                                     with the same standard deviation at 1.9% on average.
  200                                                                   On the other hand, there are clear improvements in the
                                                                     accuracy of the document classification when the Extended
  150                                                                Knowledge Graph was used to train the embedding meth-
  100
                                                                     ods. This means that the semantic structure of the knowl-
          K = 20     K = 30         K = 40     K = 50      K = 100   edge graph was enhanced, which reflected into better vector
            50 Dimensions Model A        50 Dimensions Model B       representations of entities and relations.
            100 Dimensions Model A       100 Dimensions Model B         In the last group of experiments, the Extended Knowledge
                                                                     Graph provides better results than TransE and Analogy on
          Figure 13: Execution times (in minutes).                   the Original Knowledge Graph with an accuracy of 0.734 ±
                                                                     0.002 which is the highest accuracy recorded throughout the
                                                                     experiments in this paper work.
Discussion
                                                                        In light of these outcomes, the following inferences are
The results on topic coherence throughout the three exper-           made for the effects of embedding methods on document
iments share a similar pattern. From 30 topics to upwards,           classification. The accuracy consistently increases with topic
the scores are really similar with the consideration of stan-        number. Changes on the embedding method performance re-
dard deviation, with a few results having significant differ-        flects on the document classification accuracy. Analogy with
ence. These scores also do not vary much between differ-             Model B leads to the highest accuracy scores on high topic
ent methods both for the incorporation models A and B. For           numbers. The extended knowledge graph led to increased
the results in 20 topics, we have larger difference between          accuracy, and showed that dependency trees enhanced the
methods. With the increased parameters of 100 topics and             semantics of the knowledge graph.
100 dimensional embeddings, the highest score achieved is
75.1±2.21 by Analogy, which scored 72.6±1.41 with 50
topics and 50 dimensional embeddings. Topic Coherence
                                                                                            Conclusion
with 100 topics shows that Analogy with Model B config-              This paper explored the incorporation of knowledge graph
uration proves to be successful also on higher dimensional           embeddings into topic modeling, by experimenting on var-
embedding and higher topic numbers.                                  ious aspects and identifying the ways for improvements.
   Therefore, some inferences can be made for the effects            These aspects were the semantic information in the source
of embedding methods on topic coherence: The coherence               knowledge graph, different embedding methods, perfor-
increases with topic number on average, but inconsistently.          mance effects of topic numbers and embedding dimensions.
This means that a general trend of increase is seen, except on       performance of 7 embedding methods, 2 topic models, 2
40 topics which resulted in lower coherence scores in gen-           variations of the knowledge base and various parameters
 have been explored in the context of Topic Modeling. 2 eval-    [Blei and Lafferty 2006] Blei, D., and Lafferty, J. 2006. Cor-
 uation methods, namely Topic Coherence and Document              related topic models. Advances in neural information pro-
 Classification, have been used to measure the success of         cessing systems 18:147.
 the experimentations. In the light of these results, this pa-   [Blei, Ng, and Jordan 2003] Blei, D. M.; Ng, A. Y.; and Jor-
 per work has made several contributions.                         dan, M. I. 2003. Latent dirichlet allocation. Journal of
    On Embedding Methods Comparison, Topic Coherence              machine Learning research 3(Jan):993–1022.
 and Document Classification yields different performance
                                                                 [Bollacker et al. 2008] Bollacker, K.; Evans, C.; Paritosh, P.;
 by each method, but the results have similarities. The most
                                                                  Sturge, T.; and Taylor, J. 2008. Freebase: a collaboratively
 obvious pattern is the performance of Analogy. It outper-
                                                                  created graph database for structuring human knowledge. In
 forms all other methods on higher topic numbers with Model
                                                                  Proceedings of the 2008 ACM SIGMOD international con-
 B. For lower topic numbers, simpler methods like TransE
                                                                  ference on Management of data, 1247–1250. AcM.
 and TransR produce the best results. Overall, the best aver-
 age scores come from HolE.                                      [Bordes et al. 2013] Bordes, A.; Usunier, N.; Garcia-Duran,
    The Knowledge Graph Extension scores similar results          A.; Weston, J.; and Yakhnenko, O. 2013. Translating em-
 to the original graph on Topic Coherence, but on Docu-           beddings for modeling multi-relational data. In Advances in
 ment Classification it clearly improves the accuracy. With       neural information processing systems, 2787–2795.
 increased parameters and embedding dimension, the im-           [Cao et al. 2015] Cao, Z.; Li, S.; Liu, Y.; Li, W.; and Ji, H.
 provements of the Knowledge Graph Extension are clearer,         2015. A novel neural topic model and its supervised exten-
 especially in Document Classification.                           sion. In AAAI, 2210–2216.
    The best performing embedding method Analogy with            [Fan et al. 2008] Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.;
 Model B achieves an average improvement of 0.50% over            Wang, X.-R.; and Lin, C.-J. 2008. Liblinear: A library for
 the baseline method (KGE-LDA using TransE) in Topic Co-          large linear classification. Journal of machine learning re-
 herence, and an average improvement of 1.01% over the            search 9(Aug):1871–1874.
 baseline method (KGE-LDA using TransE) in Document              [Hayashi and Shimbo 2017] Hayashi, K., and Shimbo, M.
 Classification. The Knowledge Graph Extension achieves an        2017.      On the equivalence of holographic and com-
 average improvement of 0.52% over the Original Knowl-            plex embeddings for link prediction.           arXiv preprint
 edge Graph in Topic Coherence, and an average improve-           arXiv:1702.05563.
 ment of 0.77% over the Original Knowledge Graph in Doc-
 ument Classification.                                           [Hinton and Salakhutdinov 2009] Hinton, G. E., and
    As the closing remark, the best embedding method, incor-      Salakhutdinov, R. R. 2009. Replicated softmax: an
 poration model and parameter combination is Analogy with         undirected topic model. In Advances in neural information
 Model B on high topic numbers, with high embedding di-           processing systems, 1607–1614.
 mension. The extension of the knowledge base along with         [Joulin et al. 2016] Joulin, A.; Grave, E.; Bojanowski, P.; and
 high embedding dimension enables more information to be          Mikolov, T. 2016. Bag of tricks for efficient text classifica-
 encoded into the vectors, which in turn creates a more ac-       tion. CoRR abs/1607.01759.
 curate representation of the entities compared to the Orig-     [Lin et al. 2015a] Lin, Y.; Liu, Z.; Luan, H.; Sun, M.; Rao,
 inal Knowledge Graph. This performance improvement of            S.; and Liu, S. 2015a. Modeling relation paths for rep-
 the Extended Knowledge Graph comes with a 578% growth            resentation learning of knowledge bases. arXiv preprint
 in the size of the graph.                                        arXiv:1506.00379.
    It has been shown that Analogy is the most optimal em-       [Lin et al. 2015b] Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu,
 bedding method Secondly, the results clearly show that the       X. 2015b. Learning entity and relation embeddings for
 Extended Knowledge Graph has improved both Topic Co-             knowledge graph completion. In AAAI, volume 15, 2181–
 herence score and Document Classification accuracy.              2187.
    Deeper investigations on a few points can provide further
                                                                 [Liu, Wu, and Yang 2017] Liu, H.; Wu, Y.; and Yang, Y.
 improvements on the solution. For the embedding method
                                                                  2017. Analogical inference for multi-relational embeddings.
 comparison part, the different methods have been tested with
                                                                  arXiv preprint arXiv:1705.02426.
 the same parameters. This provided an equal ground for the
 methods to compete with each other. However, a compre-          [Mikolov et al. 2013] Mikolov, T.; Chen, K.; Corrado, G.;
 hensive parameter grid search for each embedding method          Dean, J.; Sutskever, L.; and Zweig, G. 2013. word2vec.
 can increase their performance and reveal more realistic val-    URL https://code. google. com/p/word2vec.
 ues. Finally, as the specific knowledge graph extension in      [Miller 1995] Miller, G. A.        1995.     Wordnet: a lexi-
 the experiments yielded better results, there can be further     cal database for english. Communications of the ACM
 exploration on the knowledge graph capabilities.                 38(11):39–41.
                                                                 [Newman et al. 2010] Newman, D.; Lau, J. H.; Grieser, K.;
                        References                                and Baldwin, T. 2010. Automatic evaluation of topic coher-
[Blei and Jordan 2003] Blei, D. M., and Jordan, M. I. 2003.       ence. In Human Language Technologies: The 2010 Annual
 Modeling annotated data. In Proceedings of the 26th an-          Conference of the North American Chapter of the Associa-
 nual international ACM SIGIR conference on Research and          tion for Computational Linguistics, 100–108. Association
 development in informaion retrieval, 127–134. ACM.               for Computational Linguistics.
[Newman, Chemudugunta, and Smyth 2006] Newman, D.;                  In Proceedings of the 51st Annual Meeting of the Associa-
  Chemudugunta, C.; and Smyth, P. 2006. Statistical entity-         tion for Computational Linguistics (Volume 1: Long Papers),
  topic models. In Proceedings of the 12th ACM SIGKDD               volume 1, 455–465.
  international conference on Knowledge discovery and data         [Socher et al. 2013b] Socher, R.; Perelygin, A.; Wu, J.;
  mining, 680–686. ACM.                                             Chuang, J.; Manning, C. D.; Ng, A.; and Potts, C. 2013b.
[Nguyen et al. 2015] Nguyen, D. Q.; Billingsley, R.; Du, L.;        Recursive deep models for semantic compositionality over
  and Johnson, M. 2015. Improving topic models with latent          a sentiment treebank. In Proceedings of the 2013 confer-
  feature word representations. Transactions of the Associa-        ence on empirical methods in natural language processing,
  tion for Computational Linguistics 3:299–313.                     1631–1642.
[Nickel et al. 2016] Nickel, M.; Rosasco, L.; Poggio, T. A.;       [Srivastava, Salakhutdinov, and Hinton 2013] Srivastava, N.;
  et al. 2016. Holographic embeddings of knowledge graphs.          Salakhutdinov, R.; and Hinton, G. 2013. Fast inference
  In AAAI, volume 2, 3–2.                                           and learning for modeling documents with a deep boltzmann
[Nigam et al. 2000] Nigam, K.; McCallum, A. K.; Thrun, S.;          machine.
  and Mitchell, T. 2000. Text classification from labeled and      [Trouillon et al. 2016] Trouillon, T.; Welbl, J.; Riedel, S.;
  unlabeled documents using em. Machine learning 39(2-              Gaussier, É.; and Bouchard, G. 2016. Complex embeddings
  3):103–134.                                                       for simple link prediction. In International Conference on
[Pennington, Socher, and Manning 2014] Pennington,           J.;    Machine Learning, 2071–2080.
  Socher, R.; and Manning, C. D. 2014. Glove: Global               [Wang et al. 2014] Wang, Z.; Zhang, J.; Feng, J.; and Chen,
  vectors for word representation. In Empirical Methods in          Z. 2014. Knowledge graph embedding by translating on
  Natural Language Processing (EMNLP), 1532–1543.                   hyperplanes. In AAAI, volume 14, 1112–1119.
[Řehůřek and Sojka 2010] Řehůřek, R., and Sojka, P. 2010.    [Yang et al. 2014] Yang, B.; Yih, W.-t.; He, X.; Gao, J.; and
  Software Framework for Topic Modelling with Large Cor-            Deng, L. 2014. Embedding entities and relations for
  pora. In Proceedings of the LREC 2010 Workshop on                 learning and inference in knowledge bases. arXiv preprint
  New Challenges for NLP Frameworks, 45–50. Valletta,               arXiv:1412.6575.
  Malta: ELRA. http://is.muni.cz/publication/                      [Yao et al. 2017] Yao, L.; Zhang, Y.; Wei, B.; Jin, Z.; Zhang,
  884893/en.                                                        R.; Zhang, Y.; and Chen, Q. 2017. Incorporating knowl-
[Socher et al. 2013a] Socher, R.; Bauer, J.; Manning, C. D.;        edge graph embeddings into topic modeling. In AAAI, 3119–
  et al. 2013a. Parsing with compositional vector grammars.         3126.

</pre>