=Paper=
{{Paper
|id=Vol-2721/paper572
|storemode=property
|title=Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering
|pdfUrl=https://ceur-ws.org/Vol-2721/paper572.pdf
|volume=Vol-2721
|authors=Mohamed H. Gad-Elrab,Vinh Thinh Ho,Evgeny Levinkov,Trung-Kien Tran,Daria Stepanova
|dblpUrl=https://dblp.org/rec/conf/semweb/Gad-ElrabHLTS20
}}
==Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering==
<pdf width="1500px">https://ceur-ws.org/Vol-2721/paper572.pdf</pdf>
<pre>
Towards Utilizing Knowledge Graph Embedding
      Models for Conceptual Clustering

 Mohamed H. Gad-Elrab1,2 , Vinh Thinh Ho1 , Evgeny Levinkov2 , Trung-Kien
                     Tran2 , and Daria Stepanova2
          1
             Max-Planck Institute for Informatics, Saarbrücken, Germany
                      {gadelrab,hvthinh}@mpi-inf.mpg.de
           2
             Bosch Center for Artificial Intelligence, Renningen, Germany
                      {firstname.lastname}@de.bosch.com

      Abstract. We propose a framework to utilize Knowledge Graph (KG)
      embedding models for conceptual clustering, i.e., the task of clustering
      a given set of entities in a KG based on the quality of the resulting
      descriptions for the clusters. Specifically, prominent regions in the em-
      bedding space are detected using Multicut clustering algorithm, and then
      the queries describing/covering the entities within these regions are ob-
      tained by rule learning. Finally, we evaluate these queries using different
      metrics. In our preliminary experiments, we compare the suitability of
      well-known KG embedding models for conceptual clustering. The re-
      ported results provide insights for the capability of these embeddings
      to capture graph topology and their applicability for data mining tasks
      beyond link prediction.


Motivation. Knowledge graphs (KGs) are collections of hsubject, predicate,
objecti triples representing factual information in various domains. KGs are
widely applied in semantic search, question answering and data analytics.
    One of the important tasks in KG construction and curation is the con-
ceptual clustering [14, 5], which concerns splitting a given set of entities into
groups and finding descriptions for them in the form of conjunctive queries, e.g.,
Q(X ) ← created (Y , X ), type(Y , productOwner ), belongsTo(Y , boschGMBH ) de-
scribing Bosch products. Conceptual clustering is useful in a number of appli-
cations. First, the clusters along with their descriptions facilitate the learning
of new emerging concepts (aka, types) characterising prominent common entity
properties. Second, the clusters can be exploited to refine existing KG concepts,
which is useful in the context of ontology evolution. Finally, the intentionally de-
fined groupings potentially optimize semantic search and knowledge discovery.

   Recent advances in (deep) representation learning on KGs have proved to
be effective for specialized tasks such as KG completion [16] and conjunctive
query (CQ) answering [8, 13, 6]. In particular, in [13] queries are embedded as
boxes/hyper-rectangles in the embedding space, where interior points of these
boxes correspond to the set of query’s answers.
  Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
    In this preliminary study, our goal is to analyze the suitability of embed-
ding models for conceptual clustering. More specifically, we aim at investigating
whether prominent regions in the embedding space constructed by existing em-
beddings correspond to any conjunctive queries. Comparing embeddings w.r.t.
their capability of capturing such queries opens new perspectives for their ap-
plicability for various data mining tasks such as conceptual clustering.
Framework Description. To start with, we introduce our proposed framework
for utilizing KG embeddings for conceptual clustering. Given a KG and a set of
target entities, we first compute KG embedding (see, e.g., [16]). Once the entities
and relations are embedded into the vector space, we construct a complete graph
G = (V, E) over the target entities and compute pair-wise costs Φ : E 7→ R
between entity pairs using the cosine similarity of their embedding vectors.
    To detect prominent regions in the embedding space we propose to use the
multicut graph clustering approach [1], as it is an effective clustering method,
which does not require the number of clusters as input. Surprisingly, multicut
method has previously not been applied in the context of KGs to the best of our
knowledge, despite it’s obvious advantages compared to other algorithms, e.g.,
the small number of parameters to be tuned. A multicut of a graph is a subset
of its edges s.t. no cycle in the graph intersects this subset exactly once. If we
label edges in the multicut as 1, and all other edges as 0, the set of all valid
multicuts can be formalized by the following set of linear inequalities:
                                                                                       
                                                                        X              
          YG =       y : E 7→ {0, 1} | ∀T ∈ cycles(G), ∀e ∈ T : ye ≤               yf       (1)
                                                                                       
                                                                       f ∈T \{e}


 where ye and yf are labels of the edges e and f respectively obtained using the
labeling function y. Considering only chordless cycles is sufficient [1], and any
valid multicut y ∈ YG uniquely defines a graph decomposition. Given the above
definitions, we can formulate the minimum cost multicut problem:
                                           X
                                    min          (Φe + β) ye                                (2)
                                    y∈YG
                                           e∈E


    By solving (2) using efficient local search methods [10], we find an optimal
multicut and the respective optimal graph decomposition, which allows us to
detect prominent regions in the embedding space without knowing their number
even for large KGs. The cutting prior value β ∈ R can be tuned to discover more
(β < 0) or less (β > 0) clusters, than given by the pair-wise costs Φ only.
    The constructed regions, i.e., clusters, in the vector space are then mapped to
CQs by learning Horn rules. E.g., the rule r : belongsTo(X , c) ← created (Y , X ),
type(Y , productOwner ), belongsTo(Y , boschGMBH ) states that the answers to
the CQ q(X)←created (Y , X ),type(Y ,productOwner ),belongsTo(Y ,boschGMBH )
belong to the cluster c. For learning such rules, we adapt [7] to capture constants
in rule heads. We assess the quality of the rules using the following measures.
 • Per cluster coverage (cov) for a rule r, cluster c, and KG G, written as
    cov (r , c, G), is defined as the ratio of entities covered by r within the cluster
    c over the cardinality of c [11].
 • Exclusive coverage (exc) estimates exclusiveness of the rule r to the cor-
   responding cluster c compared to other clusters from a set of clusters S.
   Formally,
                                  0, if 0min {cov (r ,c,G)−cov (r ,c 0 ,G)}≤0
                                
                                       c ∈S\c
                                
                                
               exc(r ,c,S,G)=                          P
                                                               cov (r,c0 ,G)
                                                     c0 ∈S\c
                                
                                    cov (r, c, G)−                             , otherwise.
                                
                                                           |S\c|


 • Weighted Relative Accuracy (wra) measures unusualness of patterns (see [11]).
Experiments. We aim at: (Q1) comparing KG embeddings w.r.t. their suit-
ability for conceptual clustering, (Q2) evaluating the correlation between the
predictive quality of embeddings and their performance in conceptual clustering,
and (Q3) verifying the effectiveness of the multicut clustering algorithm in our
context compared to other common clustering algorithms.
    We implemented our framework in Python, using the KG embedding models
from Ampligraph [2] and Multicut algorithm [10]. We experimented with TransE
(T) and ComplEx (C), which are respectively representatives of translation-
based and linear map embedding models. We trained the two models for 100
epochs with embedding dimension set to 100. Experimenting with other embed-
ding models is left for future work.
    As described above, the cosine similarity is used to compute the pairwise
distances between entities. We tried several cutting prior values β for the Mul-
ticut algorithm and report here the best results. The Multicut is compared to
commonly used clustering algorithms, namely, DBSCAN [4], k-means [12], and
Spectral clustering [9], whose parameters are tuned and the best results are
reported. For fair comparison, we pass the number of clusters produced by Mul-
ticut to k-means and Spectral clustering, as both of these algorithms require this
parameter as input.
    We performed experiments on widely-used relational datasets: Hepatitis, Mu-
tagensis, WebKB, and Terrorist Attacks (see [3] for dataset statistics). In addi-
tion, we experimented with a large scale dataset YAGO-Artwork which contains
3.9K target entities for books, songs, and movies, randomly selected from YAGO
KG [15]. Note that, while the conceptual clustering procedure is applied only on
a subset of entities i.e., target entities, the embeddings are trained on the full
KGs to ensure that the semantic data for all of the entities is preserved.
    We evaluated the predictive quality of the trained embeddings using standard
Mean Reciprocal Rank (MRR) and Hit@k measures, and assessed the quality of
the mined queries using the functions cov, exc, and wra introduced above.
Results. Table 1 shows the predictive quality of the KG embedding models,
where ComplEx consistently achieves better results than TransE on all datasets,
except YAGO. Training embeddings for this KG is particularly challenging for
both models, and ComplEx could not converge within the training epochs due
to its complexity.
    In Table 2, we report the estimated quality of the discovered queries. For
each dataset, we present the average cov, exc, wra measures, and the number of
                   Table 1: The predictive quality of KG embedding models
                                           TransE                 ComplEx
                                    MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
                    Hepatitis   0.929 0.903 0.947                 0.974 0.946 0.919 0.970                  0.988
                    Mutagenesis 0.896 0.840 0.959                 0.983 0.953 0.919 0.988                  0.998
                    WebKB       0.210 0.158 0.223                 0.293 0.415 0.329 0.441                  0.584
                    Terrorist   0.320 0.140 0.429                 0.623 0.930 0.876 0.984                  0.998
                    YAGO-Art 0.105 0.085 0.111                   0.133 0.000 0.000 0.000                    0.000


  Table 2: Quality of learned rules; “–” refers to the failure of finding clusters
                   Hepatitis          Mutagenesis                    WebKB                     Terrorist              YAGO-Art
Methods
             cov    exc   wra cls   cov   exc   wra cls    cov       exc       wra cls   cov    exc   wra cls   cov    exc   wra cls
Multicut-T 0.567 0.567 0.037     6 0.917 0.667 0.004   4 0.948 0.147 0.009            5 0.789 0.382 0.048    4 0.763 0.695 0.053   3
Multicut-C 0.826 0.764 0.005     4 0.773 0.625 0.097   2 0.849 0.271 0.002            2 0.797 0.713 0.048    3 0.820 0.211 0.000   6
DBSCAN-T 0.707 0.582 0.028       3     –     –     –   –         –         –      –   – 0.757 0.423 0.005    2 0.908 0.578 0.072   4
DBSCAN-C 0.732 0.616 0.036       2 0.917 0.750 0.016   5         –         –      –   – 0.745 0.564 0.005    3     –     –     –   –

K-Means-T    0.610 0.491 0.065   6 0.793 0.041 0.012   4 0.969 0.015 0.013            5 0.622 0.231 0.062    4 0.672 0.404 0.094   3
K-Means-C    0.658 0.561 0.110   4 0.941 0.882 0.220   2 0.951 0.105 0.026            2 0.774 0.516 0.097    3 0.751 0.006 0.001   6
Spectral-T   0.592 0.288 0.078   6 0.932 0.035 0.012   4 0.966 0.041 0.018            5 0.667 0.270 0.062    4 0.675 0.400 0.094   3
Spectral-C   0.775 0.616 0.055   4 1.000 0.482 0.008   2 0.919 0.427 0.003            2 0.839 0.750 0.047    3 0.810 0.005 0.001   6


discovered clusters (cls). Conceptual clustering over ComplEx results in better
average exc and wra, which answers our first research question (Q1). In addition,
the higher predictive quality of ComplEx in Table 1, supports the hypothesis
(Q2), suggesting correlation between the predictive quality and the quality of
the discovered queries. This also holds for YAGO, where TransE performs better
than ComplEx in both prediction and clustering.
    Regarding (Q3), multicut achieved better results compared to DBSCAN in
the majority of datasets. Moreover, DBSCAN failed in several cases regardless
of the used parameters. Interestingly, even compared to other clustering algo-
rithms that require the number of clusters, Multicut performed better on several
datasets. This demonstrates the suitability of this algorithm for the KG domain.
    Finally, Table 3 shows example rules mined from YAGO over TransE. E.g.,
r1 describes entities in c1 as “artifacts that have a director and an actor”, while
r3 describes entities in c2 as: “artifacts created by an award winner”.
Conclusion and Outlook. We have introduced a framework for utilizing KG
embeddings for the task of conceptual clustering, which exploits the Multicut
algorithm [1, 10] for detecting prominent regions in the embedding space and
maps them to conjunctive queries over KGs using an extension of [7]. We believe
that our framework and preliminary experimental results contribute to a better
understanding of the strengths and weaknesses of the existing KG embeddings
beyond fact prediction.
    For future work, we plan to extend our framework to account for background
knowledge in the form of ontologies. Additionally, we will consider comparing
the suitability of other embedding models for the task of conceptual clustering,
especially those trained on complex patterns [13]. Another interesting direction
is to investigate utilizing the learned rules for guiding the clustering process.
     Table 3: Example rules from YAGO-Art dataset learned over TransE.
    Query                                                            cov exc wra
    r1 : belongsTo(X , c1 ) ← directed (Y , X ), acted (Z , X )      0.768 0.749 0.153
    r2 : belongsTo(X , c1 ) ← actedIn(Y , X ), type(Y , film actor ) 0.724 0.708 0.144
    r3 : belongsTo(X , c2 ) ← created (Y , X ), hasWonPrize(Y , Z ) 0.564 0.424 0.093
    r4 : belongsTo(X , c2 ) ← created (Y , X ), type(Y , writer )    0.504 0.420 0.091

References
 1. Chopra, S., Rao, M.: The partition problem. Math. Prog. 59(1–3), 87–115 (1993)
 2. Costabello, L., Pai, S., Van, C.L., McGrath, R., McCarthy, N., Tabacof, P.: Ampli-
    Graph: a Library for Representation Learning on Knowledge Graphs (Mar 2019)
 3. Dumancic, S., Blockeel, H.: An expressive dissimilarity measure for relational clus-
    tering over neighbourhood trees. MLJ (2017)
 4. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for dis-
    covering clusters in large spatial databases with noise. pp. 226–231. AAAI Press
    (1996)
 5. Fanizzi, N., d’Amato, C., Esposito, F.: Conceptual clustering and its application
    to concept drift and novelty detection. In: ESWC. pp. 318–332 (2008)
 6. Friedman, T., den Broeck, G.V.: Symbolic querying of vector spaces: Probabilistic
    databases meets relational embeddings. CoRR 2002.10029 (2020)
 7. Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in onto-
    logical knowledge bases with amie++. The VLDB Journal 24(6), 707–730 (2015)
 8. Hamilton, W.L., Bajaj, P., Zitnik, M., Jurafsky, D., Leskovec, J.: Embedding logical
    queries on knowledge graphs. In: NeurIPS 2018. pp. 2030–2041 (2018)
 9. Jianbo Shi, Malik, J.: Normalized cuts and image segmentation. IEEE Transactions
    on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
10. Keuper, M., Levinkov, E., Bonneel, N., Lavoué, G., Brox, T., Andres, B.: Efficient
    decomposition of image and mesh graphs by lifted multicuts. In: ICCV (2015)
11. Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: A unifying view. In:
    ILP (1999)
12. MacQueen, J., et al.: Some methods for classification and analysis of multivariate
    observations. In: Symp. on math. stat. and prob. vol. 1, pp. 281–297 (1967)
13. Ren, H., Hu, W., Leskovec, J.: Query2box: Reasoning over knowledge graphs in
    vector space using box embeddings. In: ICLR 2020 (2020)
14. Suárez, A.P., Mart’inez Trinidad, J.F., Carrasco-Ochoa, J.A.: A review of concep-
    tual clustering algorithms. Artif. Intell. Rev. 52(2), 1267–1296 (2019)
15. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge.
    In: Proceedings of WWW. pp. 697–706 (2007)
16. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of
    approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743
    (2017)

</pre>