<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Eficient Explanation of Predictions on DL Knowledge Graphs through Enhanced Similarity Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudia d'Amato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Benedetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Fanizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica, Università degli Studi di Bari Aldo Moro</institution>
          ,
          <addr-line>Campus Via Orabona 4, 70215 Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge Graphs are inherently incomplete, so the relationships that hold between their entities have to be discovered on the go. Generating explanations for such predictions has become a fundamental task in the perspective of eXplainable AI. This task boils down to finding meaningful (knowledge-level) reasons for predicting a certain relationship to hold between entities. To date, efective link prediction methods are based on embedding models, that represent entities and relationships in a vector space to be learned. Our goal is to extend a semantically enriched approach to generating explanations by exploring graph searches for patterns in similar situations. These patterns justify predictions made through an underlying embedding model, leading to the production of explanations. Since a bottleneck of the method is the search for similar entities and relations in the embedding space, we propose a solution based on the integration of clustering to make this search more eficient. This solution has been empirically evaluated with new experiments, proving the improvement in eficiency while preserving the efectiveness.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;embedding models</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>explanation</kwd>
        <kwd>similarity</kwd>
        <kwd>clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) are multi-relational graphs designed to organize and share real-world
knowledge where nodes represent entities of interest and edges represent diferent types of
relationships between such entities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We will focus on the latter category of KGs embodied
as shared ontologies, ultimately expressed in Description Logics (DLs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and rely on reasoning
to better exploit the wealth of available underlying knowledge.
      </p>
      <p>Due to the inherent incompleteness and heterogeneity of the sources for large KGs, two of
the most compelling basic tasks on them are link prediction and triple classification that amount,
respectively, predicting an unknown component of a triple and assessing the truth of a new or
existing triple. For these purposes, lots of numeric-statistical models have been proposed, in
particular methods that learn vector representations (embedding models) that have been shown
to scale even to very large KGs. The downside is that these models are dificult for human
experts to interpret and verify. Thus an elusive aspect concerns the trust in predictions made
through such models (e.g., in the context of a KG about drugs, the prediction of a side efect
for a given compound): the more complex and accurate the models get, the less explainable
become the reasons supporting their predictions. As a consequence, providing explanations for
the predicted results has become increasingly important.</p>
      <p>
        Current solutions to the problem of computing explanations (e.g. see [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]) can be
distinguished into two main categories [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: those related to the internal mechanisms of a model, and
those that can motivate the output predictions. Specifically, two possible approaches can be
identified [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: Pattern-based methods guide the process of creating numerical representations of
the data contained in the KG by narrowing the search space so that each dimension corresponds
to a pattern. A-posteriori methods aim at constructing explanations after the model has delivered
its predictions; they do not explain the reasons for which the internal mechanism of the model
produced a given output, but try to find a suitable explanation based on the observed output
and on the model input, i.e., the KG evidence.
      </p>
      <p>We will focus on the latter approach as it allows to adopt link prediction models based on
numerical representations of the data that are more scalable with respect to the former, and
thereby more suitable for real large-scale KGs and more capable of generating explanations
for the predictions made. In fact, there are only a few examples of approaches that are able to
explain link predictions with KGs. We will focus on an improved a-posteriori method that can
provide semantic-based explanations for link prediction on KGs. In particular, given a prediction,
the goal is to understand why it was made, giving valuable reasons to enable the user to judge
the output, understand the motivations, and thus increase confidence in the prediction.</p>
      <p>
        Specifically, moving from the SemanticCrossE explanation approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], that is based on
eliciting analogous justification patterns (exploiting a semantic similarity measure) to build
efective explanations, we propose a more eficient solution that exploits a clustering structure of
the embeddings to speed up the search and, consequently, the entire explanation process. This
is motivated by the fact that whilst the adoption of the semantic measures employed yielded a
greater ability to capture the underlying KG semantics which improved the efectiveness of the
process, it also showed noticeable limitations in eficiency which restrains the scalability of the
approach to the large dimensions of current KGs. The new solution preserves the handling of
semantics that characterizes the approach while making the explanation process more applicable
in practice. Indeed, in our comparative study, we show experimentally that while the adoption
of clustering results in a significantly more eficient explanation process, the quality of the
explanations generated using either variant of the method is substantially comparable.
      </p>
      <p>The rest of this paper is organized as follows. The functional foundations of our solution are
recalled in §2. The proposed explanation process is presented in §3, while in §4 we illustrate
the comparative experimental study in terms of eficiency and efectiveness of the variants. §5
summarizes the conclusions and delineates a possible further extension.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Basics: Embedding Models for KGs and Explanations</title>
      <p>Embedding models, a popular solution to link prediction problems on KGs are briefly recalled,
then we focus on the task of explaining predictions and specifically on SemanticCrossE, a
method that can exploit schema-level semantics which was targeted for further enhancements.</p>
      <sec id="sec-2-1">
        <title>2.1. Knowledge Graphs and Embedding Models</title>
        <p>
          A Knowledge Graph [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] can be defined as a data structure denoted with (ℰ , ℛ) involving a
set of the nodes ℰ , or entities, and a set of arcs ℛ, i.e. the relationships which connect entities.
Adopting the RDF data model, a KG can be regarded as a set of triples ⟨, , ⟩, i.e. subject,
predicate, and object where ,  ∈ ℰ and  ∈ ℛ. In RDF, the terms are denoted by the elements
of the sets  (URIs), ℬ (blank nodes) and ℒ (literals). Hence an RDF Graph is a set triples ⟨, , ⟩
with:  ∈  ∪ ℬ,  ∈  , and  ∈  ∪ ℬ ∪ ℒ [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          Several models have been proposed for embedding KGs in low-dimensional vector spaces [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
that learn a unique distributed representation (or embedding) for each entity and predicate
therein considering diferent types of components (e.g. point-wise, complex, discrete, Gaussian,
manifold). Here we adopt an embedding vector-space based on R.
        </p>
        <p>Regardless of the learning procedure, these models share a fundamental characteristic: given
(ℰ , ℛ), they represent each entity  ∈ ℰ as a continuous embedding vector e ∈ R, where
the dimension  ∈ N is a user-defined hyperparameter. Similarly, each predicate  ∈ ℛ is
associated to a scoring function  : R × R → R. For each pair of entities ,  ∈ ℰ , the score
(e, e) measures the confidence in the fact that the statement encoded by ⟨, , ⟩ holds true.</p>
        <p>The embedding of all entities and predicates in  is learned by minimizing a margin-based
loss function (i.e. one that takes into account diferences with their sign).</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Computing Explanations of Link Predictions</title>
        <p>
          SemanticCrossE, a method for computing explanations for link predictions on KGs, was
devised to better exploit the underlying semantics [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Given a predicted triple, the formulation
of an explanation is based on searching for paths of relations that link its subject to the object.
We are not interested in the orientation of the edges. This search is driven by the similarities
among both relation embeddings and entity embeddings, and makes structural comparisons
with other paths in the KG: the reliability of an explanation is reinforced by the presence of
similar paths (referred to as support).
        </p>
        <p>
          Example 1 ([
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). Given the predicted triple ⟨, fatherOf,  ⟩ a suitable explanation may be
given by the chain pattern (path) ⟨, hasWife, ⟩, ⟨, hasChild,  ⟩ that is supported by an
analogous situation, the occurrence of a similar triple ⟨, fatherOf, ⟩ that is known to be true
(not just a hypothesis) for which the explanation is ⟨, hasWife, ⟩, ⟨, hasChild, ⟩.
        </p>
        <p>
          Given a predicted triple ⟨ℎ, , ⟩ for the query ⟨ℎ, , ?⟩, the main idea consists in looking
for (short) paths from ℎ to , and provide them as explanations (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for a longer example
on the application of this method). This search aims at finding analogous situations that can
support the explanation (similarity will be discussed in the next §3.1): this requires a structural
comparison between paths (patterns) that support the explanation. The process is summarized
in Algo. 1, whose steps are described as follows:
        </p>
        <p>Given the predicted ⟨ℎ, , ⟩:</p>
        <sec id="sec-2-2-1">
          <title>1. Find the set  of the  closest relationships to  (line 6); 2. Search for the set (ℎ, ) of (all) paths between ℎ and  (lines 7-8):</title>
          <p>Algorithm 1 Explanation and support of a prediction
• a maximum length is fixed to limit the search space; considering only lengths 1 and
2, then six types of patterns are possible: 1 = {⟨ℎ, , ⟩}, 2 = {⟨, , ℎ⟩}, 3 =
{⟨′, , ℎ⟩, ⟨′, ′, ⟩}, 4 = {⟨′, , ℎ⟩, ⟨, ′, ′⟩}, 5 = {⟨ℎ, , ′⟩, ⟨′, ′, ⟩},
6 = {⟨ℎ, , ′⟩, ⟨, ′, ′⟩}, where  is a relationship similar to , ′ stands for any
other relationship, and ′ for any other entity;
• a direct search is employed to find similar paths of type 1 and 2, and bidirectional
search to find paths of types 3 through 6.
3. Find the set ℎ of the  closest entities to ℎ (line 9);
• note that considering ℎ ∈ ℎ, entities  s.t. ⟨ℎ, , ⟩ ∈  are also determined.</p>
          <p>(ℎ, ) denotes the set of paths involving the entities found in this step.
4. Search for similar structures to support the explanation (lines 10-13):
• if there exists a path  from ℎ to  (i.e. similar to ⟨ℎ, , ⟩ determined at the
previous step) whose triples  can be derived from , then  is an explanation for
⟨ℎ, , ⟩ (this is denoted with a special triple: ⟨ℎ, , ⟩);
• triples in  describe an analogous situation w.r.t. ⟨ℎ, , ⟩ involving similar entities
and relationships: then the support is extended with  which joins a similar head
to a similar tail through a relation that is similar to , i.e. analogously to ⟨ℎ, , ⟩.</p>
          <p>
            In the original formulation of CrossE [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], the analogy between pairs of entities or relationships
was assessed using the Euclidean distance, applied to their embeddings. However it is well
known that this metric may be inadequate when the assumption of isotropy for the underlying
vector-space does not hold. Considering the crucial role of similarity in the explanation process,
other measures have been considered in SemanticCrossE.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Extending the Explanation Method</title>
      <p>Moving from the initial ideas behind CrossE, various extensions were foreseen and finally
incorporated in the ultimate implementation of the explanation method, namely:
• further measures that are capable of capturing the underlying semantics of the KG to
better direct the process towards more accurate explanations;
• the usage of such measures within clustering algorithms is intended to produce groupings
of embeddings that can be exploited to accelerate the key task of similarity search.</p>
      <sec id="sec-3-1">
        <title>3.1. Extended Measure</title>
        <p>The Semantic Cosine similarity measure was motivated by the purpose of enhancing the
explanation process by better exploiting the available knowledge. Compared to the Euclidean norm,
it can capture additional and/or complementary information in the resulting embedding space.
The semantics of the KGs, particularly when represented by rich representation languages, such
as RDF-S and OWL, is often disregarded. Being able to exploit the KG semantics may lead to
generate more accurate explanations for link predictions.</p>
        <p>
          Hence, the semantic Cosine measure was introduced [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to better assess the similarity of
two vector embeddings on the ground of additional semantic information. Such information is
captured by a score function defined to this purpose.
        </p>
        <p>
          We consider the set  of the classes occurring in (ℰ , ℛ), and the functions  : ℰ → ,
 : ℛ → ,  : ℛ →  that return, resp., the conjunction of the classes an entity belongs
to, the domain and range of a relation. We also resort to the retrieval service [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], denoted in
the following as function  :  → 2ℰ , a reasoning service that returns the entities in (ℰ , ℛ)
that can be proven to belong to a given class.
        </p>
        <p>The semantic Score function for pairs of entities , ′ ∈ ℰ is defined by:
sScore(, ′) = |ret[() ⊓ (′)]| .</p>
        <p>|ret[() ⊔ (′)]|
(1)
(2)
(3)</p>
        <sec id="sec-3-1-1">
          <title>Analogously, given any two relationships , ′ ∈ ℛ, it is defined:</title>
          <p>sScore(, ′) = |ret[() ⊓ (′)]| + |ret[() ⊓ (′)]|</p>
          <p>
            |ret[() ⊔ (′)]| |ret[() ⊔ (′)]|
Given (ℰ , ℛ), the semantic Cosine measure for two entities , ′ ∈ ℰ is defined by:
semCos, (, ′) =  · sScore(, ′) +  · simcos(e, e′)
where e represents the respective embedding vector and ,  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] ..  +  = 1. In the
case of relations , ′ ∈ ℛ the measure is defined analogously.
          </p>
          <p>
            Similarly, the semantic Score for relations can be computed by considering their domains
and/or ranges, that are ultimately class expressions, and summing the degree of similarity
between the domains and the degree of similarity between the ranges.However computing
concept retrieval by using a standard reasoner may turn out to be computationally prohibitive,
or even infeasible from a practical viewpoint, when very large KGs, consisting of millions of
triples, are considered. For this reason, an approximated form of the semantic Cosine measure
and more specifically of the semantic Score function was proposed [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Learning Clustering Structures to Enhance Similarity Search</title>
        <p>
          Clustering has been shown to provide an added value to the explanation process [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Moving
from the base approach delineated in the previous section, the selection of the most similar
entities and relations can be optimized by grouping their embeddings in clusters.
        </p>
        <p>Thus, by prepending a preliminary phase to find good clustering structures over the
embedding vectors it is possible to guide the search for the most similar relations/entities by
considering only relation/entity embeddings within a single cluster. To this purpose, vector
similarity measures can be employed for an eficient computation. Of course with large numbers
of clusters, they will tend to be less crowded hence some semantically similar embeddings may
be missing from the targeted cluster. This may cause finding sub-optimal neighbors hence
limiting the efectiveness of the explanation method. Hence a trade-of has to be made between
number of clusters and quality of the resulting explanations.</p>
        <p>Two simple unsupervised algorithms, namely -Means and Agglomerative clustering, were
considered to find a given number of groupings exploiting the similarity measures employed also
in the explanation process. More complex tree-structures borrowed from Nearest Neighbors
methods may represent a natural extension to the approach we adopt in this work.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>The objective of the experiments was twofold: first, to analyze the quality of the explanations
generated for link predictions, and second, to assess how the choice of the similarity measure
afected the quality of the explanations produced by the prediction method. The study also
aimed at exploring how clustering structures could help speed up similarity search.</p>
      <p>Clustering techniques were exploited in order to speed up the similarity search in the
presented explanation methods while preserving their efectiveness. The adopted similarity measure
varied in accordance to the metric employed in the explanation phase. The clustering phase
preceded the explanation process. This phase consisted in finding, for each embedding, the
(three) closest neighbors. Then the explanations were produced for the predicted triples, and the
outcomes of the process were evaluated via the same metrics adopted in the former experiment.
Code and datasets employed are publicly available1.</p>
      <p>
        Evaluation Metrics To assess the quality of the predictions, we adopted the metrics employed
in the original evaluation of CrossE [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also in combination with other measures [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], namely:
• Elapsed Time per experiment to assess the gain in eficiency resulting from employing
pre-computed clustering structures in the key-task of similarity search;
1https://github.com/itsfrank98/SemanticCrossE/tree/clustering_2
• Recall: proportion of predicted triples for which the model can generate explanations:
#EPs/#Ps, where #EPs counts the predictions with at least one explanation and #Ps
stands for the total number of predictions;
– conforming to the mentioned previous experiments, only short explanations paths
(maximal length 2) were considered, which limits the number of possible
explanations, maintaining a greater focus on their quality and brevity;
– note that the number of explanations generated per predicted triple is not taken
into account: the recall is not afected by this number;
• Average Support: number of explanations generated, on average, for each
prediction: |1| ∑︀ ∈ |( )|, counting the number of explanations |( )|
generated for each predicted triple  , where  is the set of predictions for a query:
{ |  = ⟨ℎ, , ⟩ predicted for ⟨ℎ, , ?⟩}
– essentially the measure quantifies the reliability of the explanations: the larger the
support the more reliable and credible the prediction;
– each of the six types of explanation path was evaluated in terms of the adopted
metrics: this allows a quantitative comparison of the diferent settings.
      </p>
      <p>
        Knowledge Graphs For the sake of comparison, the same KGs adopted in the mentioned
original evaluations were considered [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We recall that, since they lack of significant semantic
information actually taken into account by SemanticCrossE, we considered DBpedia15k as
additional dataset for performing further tests in order to stress on the possible utility of the
semantic component or not. Details on the adopted KGs are summarized below:
• WN18 contains 40, 943 entities and 18 relations. It was extracted from WordNet2, where
linguistic relations (e.g., hypernymy, etc.) between synsets/entities are represented;
• FB15k-237 contains 14, 541 entities and 237 relationships. It is a subset of the original
dataset FB15k containing relation triples and textual mentions of Freebase3 entity pairs.
• DBpedia15k contains 12, 862 entities and 279 relations with 180, 000 triples extracted
from DBpedia (see [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        Parameters Setup For the comparison, as the same algorithm was used in the preliminary
link prediction phase, the settings used for the original evaluation in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (and also in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) were
maintained in the new experiments.
      </p>
      <p>
        Specifically, in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] it was suggested to consider a fixed initial number  of similar relations
and  of most similar entities. Clearly, the larger these values, the greater would be the recall,
but also the resulting noise. In the aim of generating explanations of good quality, small values
have been considered:  =  = 3. As regards the semantic score function, the considered
settings for the weights was  = 0.2 and  = 0.8; the motivation is that cosine similarity
applies to the embeddings computed by CrossE, incorporating more latent information learned,
while the semantic measure enforces the similarity complementarily.
2https://wordnet.princeton.edu/
3https://web.archive.org/web/20100228011242/http://www.freebase.com/
      </p>
      <p>
        Finally, also for the settings of the link prediction parameters were the same used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The
Tensorflow implementation of the model exploited an Adam optimizer and a dropout of 0.5
was applied to the similarity operator (max. number of iterations: 500). Further settings with
parameter values were selected diferently on a per-dataset basis and are reported in Table 1.
The parameter values are the same used for the experimental evaluations in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>As regards the clustering methods considered in the preliminary phase, finding an optimal
number of clusters  may be done beforehand via cross-validation or during their execution.
We preliminarily tested both techniques on various values of . In the following we present
results for  = 8, 10, 15 as larger numbers have been shown to worsen the performance of the
explanation methods, as expected. More complete results are made available in the repository.</p>
      <p>
        Finally, as regards the choice of the similarity measure for the explanation algorithm (and
also by the clustering procedure), the settings involved in the evaluation will be indicated as
orig., cos, acos: the first corresponds to adopting the Euclidean distance, as in the original
approach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; in the second setting the cosine similarity is adopted, and the third involves the
approximate semantic Cosine measure [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This was tested only on DBpedia15k since it was the
only KG with semantic annotations. In this case the Manhattan distance has been considered
for clustering as a faster replacement for the semantic cosine similarity.
      </p>
      <p>
        In the following the results of the experiments carried out are summarized and discussed. We
ifrst recall those collected by testing the models with embeddings not grouped in clusters by
similarity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Then we illustrate and discuss the results of new experiments where clustering
techniques have been exploited for grouping the embeddings into diferent numbers of clusters.
In the experiments described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the explanation algorithm was executed on the predicted
triples of each dataset. For each measure, the explanations were produced considering limited
portions (2% and 5%) of the total amount of predictions. This is because very low ranked
predicted triples might turn out to be incorrect and, as a consequence, the corresponding
explanations would turn out to be useless.
      </p>
      <p>Eficiency Gain The results reported in Table 2 show the advantage, in terms of elapsed time,
achieved by prepending the construction of a clustering structure to speed up the retrieval of
neighbouring embeddings, a crucial task for the explanation process.</p>
      <p>Various values for the number of clusters were experimented, varying also the similarity
measure which was also adopted by the two clustering methods. Specifically they were run
using Euclidean distance, cosine similarity and the Manhattan distance as a replacement for
the approximated Cosine similarity additionally considered only in the experiments with
DBpedia15k. As expected, with the usage of clustering, the total elapsed time reduced along
Elapsed time (hh:mm:ss): comparing the original setting without clustering to the ones exploiting the
clusters produced by -Means and Agglomerative for various values of 
dataset</p>
      <p>measure
DBpedia15k</p>
      <p>
        WN18
FB15k
orig.
cos
orig.
cos
alt.
orig.
cos
Results of the experiments with the various settings and measures (no clustering exploited for similarity
search): recall and average support per explanation by path type [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
dataset
      </p>
      <p>measure</p>
      <p>WN18
DBpedia15k</p>
      <p>FB15k-237
orig.
cos
orig.
cos
acos
orig.
cos
%
2%
5%
2%
5%
2%
5%
2%
5%
2%
5%
2%
5%
2%
5%
recall
with the number of clusters that were considered as the number of required comparisons decays.
This happens because each embedding is compared only against those in the same cluster.</p>
      <p>Of course there is a sort of trade-of to be made with efectiveness of the explanation process.
However, considering the efectiveness of the explanation process discussed in the following, it
is possible to conclude that clusterings can yield an appreciable gain in eficiency to the overall
explanation process with no significant loss in efectiveness.</p>
      <p>Efectiveness</p>
      <p>For comparative purposes, the outcomes of the experiments in terms of the
metrics aiming at assessing the efectiveness of the plain explanation process, with no clustering
involved, are recalled in Table 3. Tables 4 and 5 report the outcomes the explanation process
evaluation when preceded by the preliminary clustering phase, involving, respectively, the
two mentioned algorithms and the diferent similarity measures. It is worthwhile to notice
that the fixed number of most similar relationships and most similar entities considered in the
generation of the explanations (see discussion in §4) limits the computational costs but also
the recall. Diferently from the discussion on time, for brevity, the outcomes reported to these
tables are related to experiments with a fixed minimal number of clusters, as larger numbers
tended to worsen the performance of the overall explanation process, as expected.
-Means. Table 4 reports the results with a clustering structure produced by -Means with
 = 8. Considering the outcomes of the experiments with WN18, it can be noticed that in the
case where the Euclidean distance was used the recall is much inferior with respect the original
method with no clustering (see Table 3). Conversely, the results in terms of average support are
more comparable, with some cases in favor of the extended method. In the experiments where
the cosine similarity was used the outcomes in terms of both recall and support are only slightly
inferior w.r.t. those of the original method. In the experiments with DBPedia15k, the outcomes
observed when the Euclidean distance was adopted show no noticeable diference. Hence the
gap previously observed in the experiments with WN18 may be an exception, probably due to the
specific embeddings produced in that case. Indeed no noticeable diference can be appreciated
also in the cases involving -Means clustering in combination with the other two similarity
measures. Similar considerations can be made for the outcomes in terms of avg. support which
presented even some cases (path types) where the extended method performed slightly better.
Examining the outcomes of experiments with FB15k, there is a noticeable improvement observed
brought by the employment of the extended method: in almost all cases the recall doubled while
the gain is even larger in terms of average support in all of the cases considered. A possible
motivation is that, especially with this KG, querying for more explanations is possible as more
relevant embeddings are considered after exploiting the clustering structure.
Agglomerative Clustering. Table 5 presents the results obtained with the same KGs, employing
Agglomerative in its extended version targeting  = 10 clusters. Considering the outcomes
of the experiments with WN18, we observe that the recall measures are quite comparable (with
even a tiny improvement in the 5% sub-case where the cosine measure is adopted). Similar
considerations can be made for the outcomes in terms of average support. In the case of
DBPedia15k, one can observe that recall for the various sub-cases was almost similar. Analogous
considerations can be made for the outcomes in terms of average support where, again, small
improvements were observed for some specific path-types. Finally, in the experiments with
FB15k, we observed more significant diferences of the performance of the method when the
clustering is adopted. Namely the outcomes show a higher recall for all sub-cases. This is even
more apparent in terms of the average support outcomes where, again, major improvements
were recorded for almost all path-types. Regarding these improvements, the considerations
made in the analogous experiments with the diferent clustering method still apply.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Further Extensions</title>
      <p>We have proposed a solution to the problem of generating explanations for link predictions on
KGs. This work presented an integrated structural and semantic approach based on searching
for paths and examples of similar structures that justify the predictions made exploiting an
embedding model. CrossE was adopted as a base embedding model to compute predictions, and
an integrated algorithm based on semantic similarity measures was employed for producing
explanations of the predictions. This procedure was further extended with a preliminary clustering
phase aimed at grouping similar embeddings to improve the eficiency of the recurring key-task
of retrieving neighbors. The solution enhanced with this extension have been experimentally
evaluated, demonstrating that the semantics-aware approach is able to provide more meaningful
explanations, compared to the baseline and that the preliminary clustering phase can speed up
the overall process without degrading its efectiveness.</p>
      <p>A natural further enhancement of the proposed framework will consist in taking into account
additional semantic information in KGs that can be exploited, such as transitivity and symmetry
properties of the relationships. Another extension regarding the clustering phase can come from
adopting classical methods from the related literature that can autonomously decide the number
of clusters, such as nonparametric methods, e.g. those that produce hierarchical structures, such
as ball-trees or kD-trees based on the similarity of the entities.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the project FAIR - Future AI Research (PE00000013), spoke
6 - Symbiotic AI, under the NRRP MUR program funded by the NextGenerationEU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. de Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          . doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          (Eds.),
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          , 2nd ed.,
          <source>CUP</source>
          ,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          . 1017/CBO9780511711787.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Paudel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Interaction embeddings for prediction and explanation in knowledge graphs</article-title>
          ,
          <source>in: Proceedings of WSDM</source>
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          . doi:
          <volume>10</volume>
          .1145/3289600.3291014.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Logic attention based neighborhood aggregation for inductive knowledge graph embedding</article-title>
          ,
          <source>in: Proceedings of AAAI</source>
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>7152</fpage>
          -
          <lpage>7159</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v33i01.
          <fpage>33017152</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhowmik</surname>
          </string-name>
          , G. de Melo,
          <article-title>Explainable link prediction for emerging entities in knowledge graphs</article-title>
          , in: J.
          <string-name>
            <surname>Pan</surname>
          </string-name>
          , et al. (Eds.),
          <source>Proceedings of ISWC</source>
          <year>2020</year>
          , volume
          <volume>12506</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>55</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62419-
          <issue>4</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lécué</surname>
          </string-name>
          , J. Wu, Semantic explanations of predictions,
          <year>2018</year>
          . arXiv:
          <year>1805</year>
          .10587.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bartscherer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Menne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rettinger</surname>
          </string-name>
          ,
          <article-title>Linked data quality of DBpedia, Freebase</article-title>
          , OpenCyc, Wikidata, and
          <string-name>
            <surname>YAGO</surname>
          </string-name>
          ,
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>53</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-170275.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>C. d'Amato</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Masella</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
          </string-name>
          ,
          <article-title>An approach based on semantic similarity to explaining link predictions on knowledge graphs</article-title>
          , in: J.
          <string-name>
            <surname>He</surname>
          </string-name>
          , et al. (Eds.),
          <source>Proceedings of WI-IAT</source>
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>170</fpage>
          -
          <lpage>177</lpage>
          . doi:
          <volume>10</volume>
          .1145/3486622.3493956.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marttinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A survey on knowledge graphs: Representation, acquisition, and applications</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>33</volume>
          (
          <year>2022</year>
          )
          <fpage>494</fpage>
          -
          <lpage>514</lpage>
          . doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2021</year>
          .
          <volume>3070843</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gad-Elrab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stepanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adel</surname>
          </string-name>
          , G. Weikum, Excut:
          <article-title>Explainable embeddingbased clustering over knowledge graphs</article-title>
          , in: J.
          <string-name>
            <surname>Pan</surname>
          </string-name>
          , et al. (Eds.),
          <source>Proceedings of ISWC</source>
          <year>2020</year>
          , volume
          <volume>12506</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>237</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62419-4\ _
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Cross-lingual entity alignment via joint attribute-preserving embedding</article-title>
          , in: C.
          <string-name>
            <surname>d'Amato</surname>
          </string-name>
          , et al. (Eds.),
          <source>Proceedings of ISWC</source>
          <year>2017</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>10587</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>628</fpage>
          -
          <lpage>644</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68288-4\_
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>