<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ReliK: A Reliability Measure for Knowledge Graph Embeddings [Extended Abstract]</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maximilian K. Egger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenyue Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Mottin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Panagiotis Karras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Bordino</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Gullo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aris Anagnostopoulos</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graph Embeddings, Reliability, Knowledge Graphs</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aarhus University, Department of Computer Science</institution>
          ,
          <addr-line>Aabogade 34, 8200 Aarhus</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Copenhaghen University, Department of Computer Science</institution>
          ,
          <addr-line>Universitetsparken 1, 2100 København</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dept of Computer, Control, and Management Eng., Sapienza University of Rome</institution>
          ,
          <addr-line>Via Ariosto 25, 00185 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Unicredit</institution>
          ,
          <addr-line>Via Molfetta 101, 00171 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Can we assess a priori how well a knowledge graph embedding will perform on a specific downstream task and in a specific part of the knowledge graph? Knowledge graph embeddings (KGEs) represent entities and relationships of a knowledge graph (KG) as vectors. KGEs are generated by optimizing an embedding score, which assesses whether a triple exists in the graph. KGEs have been proven efective in a variety of downstream tasks, including, for instance, predicting relationships among entities. However, the problem of anticipating the performance of a given KGE in a certain downstream task and locally to a specific individual triple, has not been tackled so far. In this paper, we fill this gap with ReliK, a Reliability measure for KGEs. ReliK relies solely on KGE embedding scores, is task- and KGE-agnostic, and requires no KGE retraining. Through extensive experiments, we attest that ReliK correlates well with both common downstream tasks, such as tail or relation prediction and triple classification, and advanced downstream tasks, such as rule mining and question answering, while preserving locality.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge graphs (KGs) are sets of facts (i.e., triples such as “da Vinci,” “painted,” “Mona Lisa”)
that interconnect entities (“da Vinci,” “Mona Lisa”) via relationships (“painted”) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Entities
and relationships correspond to nodes and (labeled) edges of the KG, respectively. Knowledge
graph embeddings (KGEs) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are popular techniques to generate a vector representation for
entities and relationships of a KG. A KGE is computed by optimizing a scoring function that
provides an embedding score as an indication of whether a triple actually exists in the KG. KGEs
have been extensively used as a crucial building block of state-of-the-art methods for a variety
of downstream tasks commonly carried out on the Web, such as knowledge completion [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
whereby a classifier is trained on the embeddings to predict the existence of a triple; or head/tail
prediction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which aims to predict entities of a triple, as well as more advanced ones, including
rule mining [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], query answering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and entity alignment [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ].
Motivation. So far, the choice of an appropriate KGE method has depended on the downstream
task, the characteristics of the input KG, and the computational resources. The existence of many
diferent scoring functions, including linear embeddings [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], bilinear [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], based on complex
numbers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], or projections [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] further complicates this choice. Alas, the literature lacks
a unified measure to quantify how reliable the performance of a KGE method can be for a
certain task beforehand, without performing such a potentially slow task. Furthermore, KGE
performance on a specific downstream task is typically assessed in a global way, that is, in terms
of how accurate a KGE method is for that task on the entire KG. However, the performance of
KGEs for several practical applications (e.g., knowledge completion [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) typically varies across
the parts of the KG. This requires carrying out a performance assessment of KGE locally to
specific parts of the KG, rather than globally.
      </p>
      <p>Contributions. We address all the above shortages of the state of the art in KGE and introduce
ReliK (Reliability for KGEs), a simple, yet principled measure that quantifies the reliability of
how a KGE will perform on a certain downstream task in a specific part of the KG, without
executing that task or (re)training that KGE. To the best of our knowledge, no measure like ReliK
exists in the literature. ReliK relies exclusively on embedding scores as a black box, particularly
on the ranking determined by those scores (rather than the scores themselves). ReliK is simple,
intuitive, and easy-to-implement. Despite that, its exact computation requires processing all the
possible combinations of entities and relationships, for every single fact of interest. We address
this major challenge by devising an approximation to ReliK.</p>
      <p>Apart from experimenting with ReliK in basic downstream tasks, such as entity/relation
prediction, we also showcase ReliK on two advanced downstream tasks, query answering, which
ifnds answers to complex logical queries over KGs, and rule mining, deduces logic rules, with
the purpose of cleaning the KG from spurious facts or expanding the information therein.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>
        A knowledge graph (KG)  : ⟨ℰ , ℛ, ℱ ⟩ is a triple consisting of a set ℰ of  entities, a set ℛ of
relationships, and a set ℱ ⊂ ℰ × ℛ × ℰ of  facts. A fact is a triple ℎ = (ℎ, , ) , where
ℎ ∈ ℰ is the head,  ∈ ℰ is the tail, and  ∈ ℛ is the relationship. For instance, entities “Leonardo
da Vinci” and “Mona Lisa,” and relationship “painted” form the triple (“Leonardo da Vinci,”
“painted,” “Mona Lisa”). The set ℱ of facts form an edge-labeled graph whose nodes and labeled
edges correspond to entities and relationships, respectively. We say a triple ℎ is positive if it
actually exists in the KG (i.e., ℎ ∈ ℱ ), negative otherwise (i.e., ℎ ∈/ ℱ ).
Knowledge graph embedding. A KG embedding (KGE) [
        <xref ref-type="bibr" rid="ref15 ref3 ref5">3, 5, 15</xref>
        ] is a representation of entities
and relationships in a -dimensional (≪|ℰ | ) space, typically, the real R space or the complex
C space. For instance, TransE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] represents a triple ℎ as entity vectors eℎ, e ∈ R and
relation vector e ∈ R, and DistMult [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] represents the relationship as a matrix W ∈ R× .
Although KGEs can difer (significantly) from one another in their definition, a common key
aspect of all KGEs is that they are typically defined based on a so-called embedding scoring
function or simply embedding score. This is a function  : ℰ × ℛ × ℰ → R, which quantifies
how likely a triple ℎ ∈ ℰ × ℛ × ℰ exists in  based on the embeddings of its head (ℎ),
relationship (), and tail (). Specifically, the higher (ℎ), the more likely the existence of
ℎ. For instance, TransE’s score (ℎ) = −‖ eℎ + e − e‖ is the (ℓ1 or ℓ2) distance between
the “translation” from ℎ’s embedding to ’s embedding through ’s embedding [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        KGEs are typically learned via optimization (e.g., gradient descent) of a loss function defined
based on the embedding score. This training process can be computationally expensive,
especially if it has to be repeated for multiple KGEs. KGEs learned this way are shown to be efective
for a number of downstream tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], such as predicting the existence of a triple, but do not
ofer any prior indication on their performance [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Moreover, existing benchmarks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] show
global performance on the entire graph rather than local on subgraphs. To this end, in this
work, we provide an answer to the following key question: Is there a measure that provides a
prior indication on the performance of an embedding on a specific subgraph?
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. KGE Reliability</title>
      <p>We would like a measure of reliability that properly assesses how the embedding of a triple
would perform on certain tasks and data, without knowing them in advance. To this end, we
aim to fulfill the following desiderata for a KGE reliability measure.
(R1) Embedding-agnostic. Is independent of the specific KGE method. This is to have a
measure fully general. (R2) Learning-free. Requires no further KGE training. This is primarily
motivated by eficiency, but also for other reasons, such as privacy or unavailability of the data
used for KGE training. (R3) Task-agnostic. Is independent of the specific downstream task. by
anticipating the performance of a KGE in general, for any downstream task. (R4) Locality. Is
a good predictor of KGE performance locally to a given triple, that is, in a close surrounding
neighborhood of that triple. This is important, as a KGE model may be more or less efective
based on the diferent parts of the KG it is applied to.</p>
      <sec id="sec-3-1">
        <title>3.1. The Proposed ReliK Measure</title>
        <p>Design principles. To fulfill (R1) and (R2), the KGE reliability measure should not engage
with the internals of the computation of KGEs. Thus, we need to treat the embeddings as
vectors and the embedding score as a black-box function that provides only an indication of
the actual existence of a triple. Although the absolute embedding scores are incomparable to
one another, we observe that the distribution of positive and negative triples is significantly
diferent. Specifically, we assume the relative ranking of a positive triple to be higher than
that of a negative. Otherwise, we multiply the score by − 1. This leads to the following main
observation.</p>
        <p>Observation 1. A KGE reliability measure that uses the position of a triple relative to other triples
via a ranking defined based on the embedding score fulfills (R1) and (R2).</p>
        <p>Furthermore, comparing a triple to all other (positive or negative) triples might be inefective.
This is because the absolute score does not provide an indication of performance. We thus
advocate that a local approach that considers triples relative to a neighborhood is more
appropriate, and propose a measure that fulfills (R4). The soundness of (R4) is better attested in our
experiments in Section 4. To meet (R3), the KGE reliability measure should not exploit any
peculiarity of a downstream task in its definition. Indeed, this is accomplished by our measure,
as we show next.</p>
        <sec id="sec-3-1-1">
          <title>Definition.</title>
          <p>negative triples, that is, triples with head ℎ that do not exist in . Similarly, we compute  − ()
for tail . We define the</p>
          <p>head-rank ℎ of a triple ℎ as the position of the triple in the rank
obtained using score  for a specific KGE relative to all the negative triples having head ℎ.</p>
          <p>For a triple ℎ = (ℎ, , ) we compute the neighbor set  − (ℎ) of all possible
rank  (ℎ) = ⃒⃒ { ∈  − (ℎ) : () &gt; (ℎ)}⃒⃒ + 1
The tail-rank rank  (ℎ) for tail  is defined similarly.
reciprocal of the head- and tail-rank</p>
          <p>Our reliability measure, ReliK, for a triple ℎ is ultimately defined as the average of the
ReliK(ℎ) =
1 (︂
2</p>
          <p>1
rank  (ℎ)
+</p>
          <p>1
rank  (ℎ)
︂)
We define the ReliK score of a set  ⊆ ℱ</p>
          <p>of triples as the average ReliK of the triples in the
set.</p>
          <p>Rationale. ReliK ranges from (0, 1], with higher values corresponding to better reliability. In
fact, the lower the head-rank rank  (ℎ) or tail-rank rank  (ℎ), the better the ranking of
ℎ induced by the underlying embedding scores, relatively to the nonexisting triples in ℎ’s
neighborhood, complies with the actual existence of ℎ in the KG.</p>
          <p>It is easy to see that ReliK achieves (R1) and (R2) by relying on the relative ranking rather
than the absolute scores. It also fulfills (R3) as it involves no downstream tasks at all, and (R4)
as it is based on the local (i.e., 1-hop) neighborhood of a target triple.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Eficiently Computing</title>
        <p>ReliK</p>
        <p>We define our estimator as
Computing ReliK (Eq. (1)) requires Ω (|ℰ | · |ℛ| ) time, as it needs to scan the entire negative
neighborhood of the target triple. For large KGs, repeating this for a (relatively) high number
of triples may be computationally too heavy. For this purpose, here we focus on approximate
versions of ReliK, which properly trade of between accuracy and eficiency.</p>
        <p>The main intuition behind the ReliK approximation is that the precise ranking of the various
potential triples is not actually needed. Rather, what it matters is just the number of those triples
that exhibit a higher embedding score than the target triple. As such, we estimate ReliK by
evaluating the fraction of triples in the sample that have a higher score than the triple under
consideration and then scaling this fraction to the total number of negative triples.</p>
        <p>Let  be a random subset of  elements selected without replacement independently and
uniformly at random from the negative neighborhood  − (ℎ) of the head ℎ of a triple ℎ.
The size | | trades of between eficiency and accuracy of the estimator, and it may be defined
based on the size of  − (ℎ). Define also rank  (ℎ) = |{ ∈  : () &gt; (ℎ)}| + 1, to
be the rank of the score (ℎ) that the KGE assigns to ℎ, among all the triples in the sample.
We similarly compute  and rank  for tail’s neighborhood  − ().</p>
        <p>ReliKApx =
rank  (ℎ) | − (ℎ)| )︂ − 1
rank  (ℎ) | − ()| )︂ − 1]︃
In words, we simply scale up the rank induced by the sample to the entire set of negative triples.
Theorem 1. Equation 2 is an upper bound for ReliK; that is E[ReliKApx(ℎ)] ≥
ReliK(ℎ).</p>
        <p>
          Theorem 1 follows immediately from the Jensen’s inequality [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and the fact that
E[rank  (ℎ)] = | | ·
rank (ℎ) .
        </p>
        <p>| − (ℎ)|
Algorithm. To compute ReliKApx, we first sample, uniformly at random,  negative triples
from the head neighborhood and the tail neighborhood. We can save computation time by first
or the tail in common with ℎ to update the corresponding rank.
ifltering the triples in  ∪  by score, that is, by considering only those with score higher
than the input triple ℎ, and then checking whether a triple in  ∪  has either the head
triple.The sample size  trades of between accuracy and eficiency of the estimation.
Time complexity. ReliKApx runs in () time to sample the negative triples for each positive</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental evaluation</title>
      <p>
        We evaluate ReliK on four downstream tasks, six embeddings, and four datasets.
Embeddings. We include six established KGE methods, TransE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], DistMult [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], RotatE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
PairRE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], ComplEx [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], ConvE [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] representative of the major embedding families (Sec. 5).
Datasets. We perform experiments on KGs with diferent characteristics:
      </p>
      <sec id="sec-4-1">
        <title>Countries [20] is</title>
        <p>
          a small graph from geographical locations; FB15k237 [21] is a sample of Freebase KG [22]
with 14k nodes, 310k facts, and 237 relationships; Codex [23] is a collection of three datasets
of incremental size, Codex-S (2 entities, 36 triples), Codex-M (17 entities, 200 facts), and
Codex-L (78 entities, 610 facts) extracted from Wikidata and Wikipedia.; YAGO2 [24] is a
KG automatically extracted from Wikidata, which comprises 834k entities and 948k facts. .
Experimental setup. We implement our approximate and exact ReliK in Python v3.9.13 (Code
at: https://github.com/AU-DIS/ReliK). We train the embedding with Pykeen v1.10.1 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], with
default parameters besides the embedding dimension  = 50 and training loop sLCWA. We
report an average of 5 experiments using 5-fold cross validation with 80-10-10 split.
        </p>
        <sec id="sec-4-1-1">
          <title>4.1. Common Downstream Tasks</title>
          <p>
            We test ReliK on the ability to anticipate the results of common tasks for KGEs [
            <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
            ]. We
measure the statistical significance of Pearson correlation among two ranking tasks, tail and
relation prediction, and the triple classification task. To evaluate
ReliK on diferent areas of the
graph and diferent graph topologies, we sample random subgraphs of Codex-S with
60 nodes
by initially selecting a starting node uniformly at random and then including nodes and edges
by a random walk with restart [25] with restart probability 1
= 0.2, until the subgraph
comprises 60 nodes. For Codex-M and Codex-L we use size 100 and for FB15k237 we use 200
− 
nodes. We report the average ReliK on 100 random subgraphs.
          </p>
          <p>
            Ranking tasks (T1). In the first experiments, we measure the Pearson correlation between
ReliK and the performance on ranking tasks with mean reciprocal rank (MRR) [26]. The first
task, tail prediction [
            <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 14, 13</xref>
            ], assesses the ability of the embedding to predict the tail given
the head and the relation, thus answering the query (ℎ, , ?) where the tail is unknown. The
second task, relation prediction, assesses the ability of the embedding to predict the undisclosed
relation of a triple (ℎ, ?, ). The common measure used for tail and relation prediction is MRR,
which provides an indication of how close to the top the score ranks the correct tail (or relation).
Consistently with previous approaches [
            <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 14, 13</xref>
            ], we employ the filtered approach in which
we consider for evaluation only negative triples that do not appear in either the train, test, or
validation set. Table 1 reports the correlations alongside the statistical significance in terms
of the p-value. We marked in red, high p-values (&gt; 0.05), which suggest no correlation, and
Pearson score values that indicate inverse correlation. Generally, ReliK exhibits significant
correlation across embeddings and tasks. Noteworthy, even though ReliK (see Eq. (1)) does not
explicitly target tail or head rankings by including both, we observe significant correlation on
tail prediction in most embeddings and datasets. Because of the considerable training time, we
only report results for RotatE on Codex-S. Comparing the actual results of the various tasks, it
is also clear in most cases in which we do not have correlation that the results are too close
to distinguish; for example, ComplEx having only results close to 0. In such cases, the results
indicate that the particular embedding method needs additional training.
          </p>
          <p>Tail (MRR)</p>
          <p>Relation (MRR)</p>
          <p>Classific. (Acc.)
KGE</p>
          <p>Pearson p-value</p>
          <p>Pearson p-value</p>
          <p>Pearson
p-value
TransE 0.83
LDistMult 0.49
-xRotatE –
eodPairRE − 0.04
CComplEx 0.82
ConvE 0.59
TransE 0.24
7DistMult − 0.05
23kRotatE –
51PairRE 0.80
FBComplEx 0.20
ConvE 0.09
1.13− 26 0.97
2.10− 07 0.78
– –
0.68 0.95
1.03− 25 0.91
4.26− 11 − 0.07
Tuning Subgraph Size. Next, we analyze how ReliK tail relation classifier
correlates with the tasks presented in Section 4.1 on n0.8
subgraphs of varying size with the TransE embedding. rso0.6
Figure 1 reports the correlation values for all three tasks, eaP0.4
only including those values where the p-value is be- 0 20 40 60 80 100 120 140
low 0.05. We observe that ReliK’s correlation generally Subgraph size
increases with subgraphs of up to 100 nodes on Codex- Figure 1: Correlation vs subgraph size
S. After that point, we note an unstable behavior in all on Codex-S.
tasks. This is consistent with the assumption that ReliK is a measure capturing local reliability.
To strike a balance between quality and time we test on subgraphs with 60 nodes for Codex-S
in all experiments. Yet, as tasks are of diferent nature, the subgraph size can be tuned in
accordance with the task to provide more accurate results.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.2. Complex Downstream Tasks</title>
          <p>Query answering (T3). We show how ReliK can improve query-answering tasks. Complex
logical queries on KGs are working with diferent query structures. We focus on queries of
chaining multiple predictions or having an intersection of predictions, from diferent query
structures that have been described in recent work [27, 28]. We keep the naming
convention introduced by Ren and Leskovec [27]. We evaluate a selection of 1000 queries per type
(1,2,3,2,3) from their data on the FB15k237 graph.The queries of type  are 1 to 3 hops
from a given entity with fixed relation labels that point to a solution, whereas queries of type 
are the intersection of 2 or 3 predictions pointing towards the same entity. We evaluate ReliK
on the ability to detect whether an instance of an answer is true or false. We compute ReliK on
TransE embeddings trained on the entire FB15k237. Figure 2 shows the average ReliK scores
for positive and negative answers. ReliK clearly discriminates between positive and negative
instances, often by a large margin.</p>
          <p>· 10− 5Query Answering</p>
          <p>Detecting true instances. To showcase performance on the downstream task (T4), we compare
ReliK with the reciprocal rank (RR) of a combination of the tail and the relation embeddings
on the ability to detect whether an instance of a rule is true or false. This task is particularly
important to quantify the real confidence of a rule [ 33]. To this end, we use a dataset [34]
comprising 23 324 manually annotated instances over 26 rules extracted from YAGO2 using
the AMIE [29] and RudiK [31] methods. We compute ReliK on TransE embeddings trained on
the entire YAGO2. Figure 2 shows the average ReliK scores for positive and negative instances.
ReliK discriminates between positive and negative instances, often by a large margin, whereas
RR often confounds positive and negative instances.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Related Work</title>
      <p>
        Knowledge graph embeddings are commonly used to detect missing triples, correcting
errors, or question answering [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]. Translational embeddings in the TransE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] family and
the recent PairRE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] assume that the relationship performs a translation from the head to
the tail. Semantic embeddings, such as DistMult [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or HolE [35], interpret the relationship
as a multiplicative operator. Complex embeddings, such as RotatE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and ComplEx [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], use
complex-valued vectors and operations in the complex plane. Neural-network embeddings, such
as ConvE [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], perform sequences of nonlinear operations.
      </p>
      <p>
        Embedding calibration. An orthogonal direction to ours is embedding calibration [36, 37].
Calibration methods provide efective ways to improve the existing embeddings on various
tasks, by altering the embedding vectors in subspaces with low accuracy [36], by reweighing
the output probabilities in the respective tasks [37], or by matrix factorization [38].
Evaluation of embeddings. ReliK bears an interesting connection with ranking-based quality
measures, in particular with the mean reciprocal rank (MRR) and HITS@k for head, tail, and
relation prediction [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref3">3, 36, 39, 12, 14, 13</xref>
        ]. Even though MRR and HITS@k provide a global
indication of performance, ReliK is suitable for local analysis. Yet, current global measures have
been recently shown to be biased towards high-degree nodes [40].
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Aiming to develop a measure that prognosticates the performance of a knowledge graph
embedding on a specific subgraph, we introduced ReliK, a KGE reliability measure agnostic
to the choice of the embeddings, the dataset, and the task. To allow for eficient computation,
we proposed a sampling-based approximation, which we show to achieve similar results to
the exact ReliK in a fraction of the time. Our experiments confirm that ReliK anticipates the
performance on a number of common and complex downstream tasks for KGEs. These results
suggest that ReliK may be used in other domains, as well as a debugging tool for KGEs.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>M. Egger is supported by the Danish Council for Independent Research grant DFF-1051-00062B.
W. Ma is supported by the China Scholarship Council grant 202110320012.
[20] G. Bouchard, S. Singh, T. Trouillon, On approximate reasoning capabilities of low-rank
vector spaces, in: AAAI, 2015.
[21] K. Toutanova, D. Chen, Observed versus latent features for knowledge base and text
inference, in: Proceedings of the 3rd workshop on continuous vector space models and
their compositionality, 2015, pp. 57–66.
[22] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created
graph database for structuring human knowledge, in: SIGMOD, 2008, pp. 1247–1250.
[23] T. Safavi, D. Koutra, Codex: A comprehensive knowledge graph completion benchmark,
in: EMNLP, 2020, pp. 8328–8350.
[24] J. Hofart, F. M. Suchanek, K. Berberich, G. Weikum, Yago2: A spatially and temporally
enhanced knowledge base from wikipedia, Artificial intelligence 194 (2013) 28–61.
[25] H. Tong, C. Faloutsos, J.-Y. Pan, Fast random walk with restart and its applications, in:</p>
      <p>ICDM, IEEE, 2006, pp. 613–622.
[26] N. Craswell, Mean reciprocal rank., Encyclopedia of database systems 1703 (2009).
[27] H. Ren, J. Leskovec, Beta embeddings for multi-hop logical reasoning in knowledge graphs,</p>
      <p>Advances in Neural Information Processing Systems 33 (2020) 19716–19726.
[28] Y. Bai, X. Lv, J. Li, L. Hou, Answering complex logical queries on knowledge graphs via
query computation tree optimization (2023).
[29] L. A. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: association rule mining
under incomplete evidence in ontological knowledge bases, in: TheWebConf, 2013, pp.
413–422.
[30] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek, Fast rule mining in ontological
knowledge bases with amie ++, VLDBJ 24 (2015) 707–730.
[31] S. Ortona, V. V. Meduri, P. Papotti, Robust discovery of positive and negative rules in
knowledge bases, in: ICDE, 2018, pp. 1168–1179.
[32] R. Agrawal, R. Srikant, et al., Fast algorithms for mining association rules, in: VLDB,
volume 1215, Santiago, Chile, 1994, pp. 487–499.
[33] M. Loster, D. Mottin, P. Papotti, J. Ehmüller, B. Feldmann, F. Naumann, Few-shot knowledge
validation using rules, in: TheWebConf, 2021, pp. 3314–3324.
[34] M. Loster, D. Mottin, P. Papotti, J. Ehmüller, B. Feldmann, F. Naumann, Colt dataset, 2021.</p>
      <p>URL: https://hpi.de/naumann/projects/repeatability/datasets/colt-dataset.html, accessed
on November 1, 2023.
[35] M. Nickel, V. Tresp, H.-P. Kriegel, et al., A three-way model for collective learning on
multi-relational data., in: ICML, volume 11, 2011, pp. 3104482–3104584.
[36] T. Safavi, D. Koutra, E. Meij, Evaluating the calibration of knowledge graph embeddings
for trustworthy link prediction, in: EMNLP, 2020.
[37] P. Tabacof, L. Costabello, Probability calibration for knowledge graph embedding models,
in: ICLR, 2020.
[38] C. Demir, J. Lienen, A.-C. Ngonga Ngomo, Kronecker decomposition for knowledge graph
embeddings, in: HT, 2022, pp. 1–10.
[39] F. Bianchi, G. Rossiello, L. Costabello, M. Palmonari, P. Minervini, Knowledge graph
embeddings and explainable ai, in: Knowledge Graphs for eXplainable Artificial Intelligence:
Foundations, Applications and Challenges, IOS Press, 2020, pp. 49–72.
[40] S. Tiwari, I. Bansal, C. R. Rivero, Revisiting the evaluation protocol of knowledge graph
completion methods for link prediction, in: TheWebConf, 2021, pp. 809–820.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. de Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM CSUR 54</source>
          (
          <year>2022</year>
          )
          <volume>71</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>71</lpage>
          :
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          , Knowledge graphs
          <year>2021</year>
          :
          <article-title>A data odyssey</article-title>
          ,
          <source>PVLDB</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>3233</fpage>
          -
          <lpage>3238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>TKDE</source>
          <volume>29</volume>
          (
          <year>2017</year>
          )
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          , S. Liu, T. Wu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph quality control: A survey</article-title>
          ,
          <source>Fundamental Research</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>607</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marttinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Philip</surname>
          </string-name>
          ,
          <article-title>A survey on knowledge graphs: Representation, acquisition, and applications</article-title>
          ,
          <source>Trans. Neural Netw. Learn. Syst</source>
          .
          <volume>33</volume>
          (
          <year>2021</year>
          )
          <fpage>494</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          , S. W.-t. Yih,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Embedding entities and relations for learning and inference in knowledge bases</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>A holistic approach for answering logical queries on knowledge graphs</article-title>
          ,
          <source>in: ICDE</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2345</fpage>
          -
          <lpage>2357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Dragut</surname>
          </string-name>
          , W. Meng,
          <article-title>Globally aware contextual embeddings for named entity recognition in social media streams</article-title>
          ,
          <source>in: ICDE</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1544</fpage>
          -
          <lpage>1557</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ren</surname>
          </string-name>
          , W. Hu,
          <article-title>Deep active alignment of knowledge graph entities and schemata</article-title>
          ,
          <source>PACMMOD</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <volume>159</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>159</lpage>
          :
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Papadakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Skoutas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koubarakis</surname>
          </string-name>
          ,
          <article-title>Pre-trained embeddings for entity resolution: An experimental analysis</article-title>
          ,
          <source>PVLDB</source>
          <volume>16</volume>
          (
          <year>2023</year>
          )
          <fpage>2225</fpage>
          -
          <lpage>2238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <article-title>Semantics driven embedding learning for efective entity alignment</article-title>
          ,
          <source>in: ICDE</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2127</fpage>
          -
          <lpage>2140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>NeurIPS</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Rotate: Knowledge graph embedding by relational rotation in complex space</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <article-title>Pairre: Knowledge graph embeddings via paired relation vectors</article-title>
          ,
          <source>in: ACL</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4360</fpage>
          -
          <lpage>4369</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Hoyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vermue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharifzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Bringing light into the dark: A large-scale evaluation of knowledge graph embedding models under a unified framework</article-title>
          ,
          <source>TPAMI</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>8825</fpage>
          -
          <lpage>8845</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          , W.-T. Balke,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          ,
          <article-title>Do embeddings actually capture knowledge graph semantics?</article-title>
          , in: ESWC,
          <year>2021</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. L. W. V.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <article-title>Sur les fonctions convexes et les inégalités entre les valeurs moyennes</article-title>
          ,
          <source>Acta mathematica 30</source>
          (
          <year>1906</year>
          )
          <fpage>175</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Trouillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Welbl</surname>
          </string-name>
          , S. Riedel, É. Gaussier, G. Bouchard,
          <article-title>Complex embeddings for simple link prediction</article-title>
          , in: ICML, PMLR,
          <year>2016</year>
          , pp.
          <fpage>2071</fpage>
          -
          <lpage>2080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Minervini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Convolutional 2d knowledge graph embeddings</article-title>
          ,
          <source>in: AAAI</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>