<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Corresponding author.
$ nicolas.hubert@loria.fr (N. Hubert); pierre.monnin@orange.com (P. Monnin); armelle.brun@loria.fr (A. Brun);
davy.monticolo@univ-lorraine.fr (D. Monticolo)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Knowledge Graph Embeddings for Link Prediction: Beware of Semantics!</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Hubert</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Monnin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Armelle Brun</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davy Monticolo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Orange</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>CNRS, LORIA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>ERPI</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The task of predicting links in knowledge graphs (KGs) can be tackled using knowledge graph embedding models (KGEMs). Such models project entities and relations of a KG into a low-dimensional vector space that preserves as much as possible the properties of the graph. The performance of KGEMs for link prediction is traditionally assessed using rank-based metrics that evaluate the ability of models to give high scores to ground-truth entities. However, other scored entities are left unconsidered by these metrics. This constitutes a shortcoming in some application domains where it may be required to ensure consistency among the top-scored entities. To this aim, in this paper we propose to measure the ability of popular KGEMs to capture the semantic profile of relations. In particular, we use Sem@ , a semanticoriented metric that assesses whether top-scored entities are semantically valid. Our experiments show that agnostic KGEMs are actually able to learn the semantic profile of relations. This raises the opportunity of using Sem@ as an additional training criterion.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph Embeddings</kwd>
        <kwd>Link Prediction</kwd>
        <kwd>Rank-Based Metrics</kwd>
        <kwd>Semantic-Oriented Metrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A knowledge graph (KG) is a collection of triples (ℎ, , ) where ℎ (head) and  (tail) are two
entities of the graph, and  is a predicate (also called relation) that qualifies the relationship
holding between them. KGs support several tasks including entity matching, question answering,
and link prediction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The latter is the focus of this paper. Given a triple (?, , ) (resp. (ℎ, , ?)),
link prediction (LP) consists in predicting the most plausible head ℎ (resp. tail ). Knowledge
Graph Embedding Models (KGEMs) address this particular task by projecting entities and
relations of the KG into a low-dimensional vector space that preserves as much as possible
the properties of the graph [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Training a KGEM firstly requires corrupting existing triples by
replacing either their head ℎ or their tail  with another entity to generate negative counterparts.
This procedure is called negative sampling [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. Secondly, the KGEM iteratively learns to
assign higher scores to true triples than to their negative counterparts.
      </p>
      <p>
        The performance of Knowledge Graph Embedding Models (KGEMs) for LP is ultimately
evaluated using rank-based metrics such as Hits@, Mean Rank (MR), and Mean Reciprocal
Rank (MRR) that evaluate whether ground-truth entities are indeed given higher scores [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
However, various works recently raised some caveats about such metrics [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Indeed,
they not only lack a theoretical grounding (see Section 2.1) but are also not well-suited for
drawing comparisons across datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. More importantly, they only provide a partial picture
of KGEM performance [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Indeed, LP can lead to nonsensical triples, such as (BarackObama,
isFatherOf, USA), being predicted as highly plausible facts, although they violate constraints
on the domain and range of relations [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ]. KGEMs with such issues may nevertheless reach a
satisfying performance in terms of rank-based metrics since these violations are not taken into
account by the metrics.
      </p>
      <p>
        Few works propose to go beyond the mere traditional quantitative performance of KGEMs
and address their ability to capture the semantics of the original KG [
        <xref ref-type="bibr" rid="ref10">10, 11, 12</xref>
        ]. This is why
we advocate for additional qualitative and semantic-oriented metrics to supplement traditional
rank-based metrics. According to Berrendorf et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], this would give a more complete picture of
the performance of a KGEM. More precisely, such semantic-oriented metrics allow to investigate
dimensions that remain unexplored in the literature and yet deserve greater attention. For
example, given the domain and range of a relation, such metrics would allow to assess the need
for post-filtering candidate entities that do not belong to the expected types. Indeed, if the
KGEM is not able to assign higher ranks to entities of the correct types without supervision, a
post-filtering phase may be needed to ensure consistency in the predictions.
      </p>
      <p>Accordingly, in this work, our goal is to assess the ability of popular KGEMs to capture the
semantic profile (i.e., domain and range) of relations in a link prediction task. To do so, we build
on Sem@, the semantic-oriented metric that we introduced in [13] for a recommendation
task. In this previous work, Sem@ was used to evaluate the ability of KGEMs to recommend
items that are of the expected type. Here, we extend the scope of the metric to fit the more
generic LP task.</p>
      <p>The remainder of the paper is structured as follows. Related work is presented in Section 2.
In Section 3, we detail the KGEMs used in this work as well as the semantic-oriented metric
Sem@ that we tailor for the LP task. Dataset descriptions, experimental settings and key
ifndings are provided in Section 4. Lastly, Section 5 outlines future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Evaluating KGEM Performance for Link Prediction</title>
        <p>
          KGEM performance is almost exclusively assessed using the following rank-based metrics:
Hits@, Mean Rank (MR), and Mean Reciprocal Rank (MRR) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We recall their definitions
and discuss their limits below. The use of such metrics stems from the fact that training and
evaluating a KGEM requires generating negative triples [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Thus, positive triples are scored
against negative ones to determine whether the model is able to predict plausible facts. More
specifically, given a ground-truth triple (ℎ, , ), all possible triples (?, , ) and (ℎ, , ?) are
1 ∑︁ 1 [rank() ≤ ]
|ℬ| ∈ℬ
where ℬ is the batch of ground-truth triples, rank() is the position of the ground-truth triple
 in the sorted list of triples, and 1 [rank() ≤ ] yields 1 if  is ranked between 1 and , 0
otherwise. This metric is bounded in the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] range and its values increases with , where the
higher the better.
        </p>
        <p>Mean Rank (MR) (Equation (2)) corresponds to the arithmetic mean over ranks of the
ground-truth triples:
generated with all the entities observed in the KG. Then, such triples are scored by the KGEM
and their scores are compared with the score given to the ground-truth triple.</p>
        <p>Hits@ (Equation (1)) accounts for the proportion of ground-truth triples appearing in the
ifrst  top-scored triples:
(1)
(2)
(3)
MR =
|ℬ| ∈ℬ
1 ∑︁ rank()
MRR =
1 ∑︁ 1
|ℬ| ∈ℬ rank()
This metric is bounded in the [0, |ℰ |] interval, where |ℰ | stands for the number of entities in the
KG, where the lower the better.</p>
        <p>
          Mean Reciprocal Rank (MRR) (Equation (3)) corresponds to the arithmetic mean over the
reciprocals of ranks of the ground-truth triples:
Contrary to MR, MRR is a metric bounded in the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] interval, where the higher the better.
Because this metric does not use any threshold  compared to Hits@, it is less sensitive to
outliers. In addition, it is often used for performing early stopping and for tracking the best
epoch during training [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ].
        </p>
        <p>
          As mentioned in Section 1, these metrics present some caveats. LP is often used in a knowledge
base completion perspective, where the Open World Assumption (OWA) prevails. KGs are
incomplete and, due to the OWA, an unobserved triple used as a negative one can still be positive.
It follows that traditional evaluation methods based on rank-based metrics may systematically
underestimate the true performance of a KGEM [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In addition, the aforementioned rank-based
metrics have intrinsic and theoretical flaws, as pointed out in several works [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ]. For example,
Hits@ does not take into account triples whose rank is larger than . As such, a model
scoring the ground-truth in position  + 1 would be considered equally good as another model
scoring the ground-truth in position  +  with  ≫ 1. It follows that Hits@ is not a suitable
metric for drawing comparisons between models [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. MR alleviates this concern as it does not
consider any threshold . Therefore, MR allows to compare KGEM performance on the same
dataset. Nonetheless, MR is sensitive to the number of KG entities (see Equation (2)) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: a MR
of 10 indicates very good performance if the set of entities is in the thousands, but it would
indicate poor performance if the set of entities is much more restricted. Therefore, MR does not
allow comparisons across datasets.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Combining Embeddings with Semantics</title>
        <p>
          The possibility of using additional semantic information to enhance KGEM performance has
been extensively studied. A significant part of the literature incorporates semantic information
to constrain the negative sampling procedure and generate meaningful negative triples [
          <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
          ].
For instance, type-constrained negative sampling (TCNS) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] replaces the head or the tail of a
triple with a random entity belonging to the same type as the ground-truth entity. Jain et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
go a step further and use ontological reasoning to iteratively improve KGEM performance by
retraining the model on inconsistent predictions. Semantic information can also be embedded
in the model itself [14, 15]. In [15], the proposed KGEM embeds both entities and entity types,
which allows entities to have diferent vector representations depending on their respective
types.
        </p>
        <p>
          Recall that embedding models project entities and relations of a KG into a vector space.
As such, the semantics of the original KG may not be fully preserved [
          <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
          ]. As stated by
Paulheim [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], because embeddings are not meant to preserve the semantics of the KG, they are
not interpretable and this can severely hinder explainability in domains such as recommender
systems. Consequently, Paulheim [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] advocates for semantic embeddings. Similarly, Jain
et al. [11] perform a thorough evaluation of popular KGEMs to better assess their semantic
awareness. A key finding is that in most datasets, the semantic representation of some entities
is easier to grasp than for other ones. For instance, the task of finding semantically similar
entities does not always provide satisfying results when working with entity embeddings [11].
        </p>
        <p>Although some aforementioned approaches leverage the semantics of entities and relations
to improve KGEM performance in terms of rank-based metrics, their ability to generate sensical
predictions is not directly addressed. This encourages further assessment of the semantic
capabilities of KGEMs. In our work, we directly address this issue by assessing to what extent
KGEMs are able to give high scores to triples whose head (resp. tail) belongs to the domain
(resp. range) of the relation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Measuring Semantic Awareness: our Proposal</title>
      <p>Section 3.1 summarizes the KGE models used in this work. Then, Section 3.2 details the
semanticoriented metric used to experimentally evaluate and compare these models and highlight their
semantic awareness.</p>
      <sec id="sec-3-1">
        <title>3.1. Knowledge Graph Embedding Models</title>
        <p>
          As in [16], we study three highly popular KGEMs, namely TransE [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], DistMult [17], and
ComplEx [18].
        </p>
        <p>
          TransE is the earliest translational model. It learns representations of entities and relations
such that for a triple (ℎ, , ), eℎ + e ≈ e, where eℎ, e and e are the head, relation and
tail embeddings, respectively. The scoring function is  (ℎ, , ) = − (eℎ + e − e) with
 a distance function, usually the 1 or 2 norm. TransE does not properly handle 1-to-N,
N-to-1, nor N-to-N relations [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and yet has been found to be very eficient in multi-relational
settings [19].
        </p>
        <p>DistMult is a semantic matching model. It is characterized as such because it uses a
similaritybased scoring function and matches the latent semantics of entities and relations by leveraging
their vector space representations. More specifically, DistMult is a bilinear diagonal model
that uses a trilinear dot product as its scoring function:  (ℎ, , ) = ⟨eℎ, W, e⟩. It is similar
to RESCAL [20] – the very first semantic matching model – but restricts relation matrices
W ∈ R×  to be diagonal. As the scoring function of DistMult is commutative, all relations
are considered symmetric. This assumption does not hold in general. However, DistMult still
achieves state-of-the-art performance in most cases [21].</p>
        <p>ComplEx is also a semantic matching model. It extends DistMult by using complex-valued
vectors to represent entities and relations: eℎ, e, e ∈ C. As a result, ComplEx is better able
to model antisymmetric relations than DistMult [22]. Its scoring function uses the Hadamard
product:  (ℎ, , ) = Re (eℎ ⊙ e ⊙ e) where e denotes the conjugate of e.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. A Semantic-Oriented Metric for Measuring Link Prediction Performance</title>
        <p>
          The standard LP evaluation protocol consists in reporting aggregated results, considering the
rank-based metrics presented in Section 2.1. As mentioned in Section 1, these metrics only
provide a partial picture of KGEM performance [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. To give a more comprehensive assessment
of KGEMs, we aim at jointly assessing their semantic awareness using Sem@ [13]. In [13],
Sem@ is specifically defined for the recommendation task seen as predicting tails for a unique
target relation. In this work, we extend Sem@ to the more generic LP task, where not only
tails but also heads are corrupted and all relations are considered equally.
        </p>
        <p>This adapted version of Sem@ (Equation (4)) accounts for the proportion of triples that are
semantically valid in the first  top-scored triples:
1 ∑︁ 1
|ℬ| ∈ℬ 
∑︁ compatibility(, ′)
′∈
where, given a ground-truth triple  = (ℎ, , ),  is the top- candidate triples scored by
a given KGEM (i.e. by predicting the tail for (ℎ, , ?) or the head for (?, , )). The operator
compatibility(, ′) (Equation (5)) assesses whether the candidate triple ′ is semantically
compatible with its ground-truth counterpart . In this work, by semantic compatibility we
refer to the fact that the predicted head (resp. tail) belongs to the domain (resp. range) of the
relation:
compatibility(, ′) =
{︃1, if type(ℎ′) = domain() ∧ type(′) = range()
0, otherwise
(4)
(5)
where type() returns the type of entity  and domain() (resp. range()) is the domain (resp.
range) of the relation . , ℎ′, and ′ denote the ground-truth relation, the head and the tail of
the ranked triple ′, respectively. Note that type hierarchy is not considered in this work.</p>
        <p>
          Sem@ is bounded in the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] interval. Compared to Hits@ (Equation (1)), Sem@ is
non-monotonic: increasing  can lead to either lower or higher Sem@ values.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>In this section, we assess TransE, DistMult, and ComplEx performance in terms of MRR and
Sem@ on real-world and public datasets.</p>
        <p>The experiments are carried out on EduKG1 [23], FB15K-237 [24], and KG20C [25], three KGs
that have been chosen for their adequate entity typing: given a head-relation pair (ℎ, ) (resp. a
relation-tail pair (, )), the missing tail (resp. head) can only be of one single type. In addition,
these datasets comprise a diferent number of entities, relations, and entity types as depicted in
Table 1. Consequently, they allow us to study whether the performance of a KGEM in terms of
Sem@ is dataset-dependent. Note that relations with less than 10 semantically valid heads or
tails are removed from the test set, so that Sem@10 is not wrongfully lowered by an insuficient
number of candidates for either the domain or range of such relations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Setup</title>
        <p>TransE, DistMult, and ComplEx are implemented using PyTorch2. All models are trained during
1,000 epochs, with one generated negative triple for each positive triple. The max-margin loss
function and Adam optimizer are used, as in [16]. To choose the best hyperparameters2 for each
model and dataset, grid-search was performed on the validation sets, with possible values for
the embedding dimension  ∈ {10, 20, 50, 100, 200, 300}, learning rate  ∈ ︀[ 10− 4, 10− 1]︀ , and
margin for the loss function  ∈ {1, 2, 5, 10, 15, 20}. In line with [22], we experimentally found
that no regularization was needed, as a good choice of  prevents KGEMs from overfitting.
Recall that given a ground-truth triple (ℎ, , ), all possible triples (?, , ) and (ℎ, , ?) are
generated with all the entities observed in the KG and scored by the KGEMs.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>1purl.org/edukg/doc
2The code of our experiments and the best hyperparameters are available at purl.org/dl4kg-2022.
100
90
K 80
@
m
eS 70
60
50100
100
90
K 80
@
m
eS 70
60
50100
300
500
#Epochs 700</p>
        <p>900
(a) TransE – EduKG
300
500
#Epochs 700
900
0.25
0.20
0.15 RRM
0.10
0.05
0.25
0.20
0.15 RRM
0.10
0.05
100
90
K 80
@
m
eS 70
60
50100
100
90
K 80
@
m
eS 70
60
50100
100
90
K 80
@
m
eS 70
60
50100
100
90
K 80
@
m
eS 70
60
50100
300
500
#Epochs 700
900
300
500
#Epochs 700
900
(d) TransE – FB15K-237
(e) DistMult – FB15K-237
(f) ComplEx – FB15K-237
300
500
#Epochs 700</p>
        <p>900
(g) TransE – KG20C
300
500
#Epochs 700</p>
        <p>900
(h) DistMult – KG20C
300
500
#Epochs 700</p>
        <p>900
(i) ComplEx – KG20C</p>
      </sec>
      <sec id="sec-4-4">
        <title>Analysis of Sem@K w.r.t the number of epochs. In this paragraph, Sem@ is analyzed</title>
        <p>according to the number of epochs (Figure 1). Sem@ reaches high values during the first
epochs, regardless of the model and dataset at hand. Most notably, using TransE on KG20C
(Figure 1g), Sem@ reaches its maximum values after only 100 epochs with a near-perfect
ability to infer the semantic profile of the relations, as evidenced in Table 2. Therefore, it seems
that agnostic models are quickly able to learn the domain and range of relations.</p>
        <p>It is worth mentioning that the decline of Sem@ appears to be less marked with lower 
values. To substantiate this claim, let us consider the Sem@ inflexion point – after which
Sem@ starts to decrease – and calculate the diference between the best achieved Sem@ 
and Sem@ at epoch 1,000. For example, the best Sem@ values using TransE on EduKG
are 97.2%, 98.0%, and 97.9% for  = 1,  = 5, and  = 10, respectively. Sem@1, Sem@5,
and Sem@10 at epoch 1,000 are 96.0% (− 1.2 pts), 94.1% (− 4.0 pts), and 90.2% (− 7.8 pts),
respectively. Except for TransE on FB15K-237 and DistMult on KG20C, all the other combinations
of model/dataset lead to the same conclusion: Sem@10 systematically decreases more than
Sem@5, just as Sem@5 compared to Sem@1. Figure 1a substantiates this claim as this statement
is clearly observed.
0.217
0.215
0.204
Comparison of Sem@K Across Models. In Figure 1, we also note that considering Sem@
decline, DistMult and ComplEx seem to be more robust than TransE according to the number
of epochs. For example, comparing with the previously mentioned Sem@ losses using
TransE on EduKG, for DistMult these are all below − 2.1 pts for Sem@1, Sem@5, and Sem@10.
For ComplEx, these are all below − 1.1 pts for Sem@1, Sem@5, and Sem@10. It is worth
investigating whether there is a theoretical explanation that semantic matching models are
more robust to Sem@ decline than translational models. This requires further experiments
with additional models and a theoretical demonstration that these diferences are due to the
nature of the KGEMs. This question is left for future research. Moreover, it is noteworthy that
Sem@ may depend on the number of entities belonging to each class. For example, KG20C
has more entities and less classes than EduKG. Hence, on KG20C, there is a higher probability of
predicting semantically valid triples, which seems to be reflected in the higher Sem@  values.
Comparison of MRR and Sem@K. Overall, the best models in terms of MRR are not
necessarily the best regarding Sem@ (Table 2). For instance, considering the results achieved
on KG20C at epoch 100, TransE has the lowest MRR of the three models: MRRTransE = 0.094,
MRRDistMult = 0.115 and MRRComplEx = 0.149. However, at this very same epoch, TransE
showcases near-perfect Sem@ values, compared to the values achieved with DistMult and
ComplEx which remain significantly lower (Figure 1h and Figure 1i). This may not be
datasetdependent, as the same remark applies to EduKG and FB15K-237. In addition, we note that
regardless of the model and dataset at hand, Sem@ starts to decrease well before MRR in terms
of epochs (Figure 1). This means that while Sem@ starts to decline, MRR values continue
rising. As such, maximizing Sem@ leads to non-optimal values for MRR, and vice versa.
Therefore, a trade-of is to be considered. Some use cases may require ensuring homogeneity in
the types of the top-ranked entities. In such scenarios, it may be advisable to retain Sem@ as
an additional criterion for tracking the best epoch and performing early-stopping.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Directions</title>
      <p>In this work, we consider the link prediction task and measure the ability of popular KGEMs to
predict entities that are semantically valid. Two main findings deserve to be highlighted. We
show that agnostic KGEMs can learn the semantic profile of relations. In some cases, however,
this comes at the expense of the KGEM performance in terms of traditional rank-based metrics.
Thus, there seems to be a trade-of between the semantic awareness of KGEMs and their ability
to give higher scores to ground-truth entities. Consequently, our take-home message is that both
the training and evaluation of KGEMs would benefit from including Sem@  as an additional
criterion alongside commonly used rank-based metrics. Indeed, their combination would give a
more complete picture of KGEM performance, as Sem@K provides information on the nature of
the errors made by a link prediction model. In future works, we will extend our analysis using
more datasets and models, including Graph Neural Networks. We will also explore how type
hierarchies and missing or evolving domain and range of relations can be taken into account.
strations, Industry and Blue Sky Ideas Tracks, volume 2180 of CEUR Workshop Proceedings,
2018.
[11] N. Jain, J. Kalo, W. Balke, R. Krestel, Do embeddings actually capture knowledge graph
semantics?, in: The Semantic Web - 18th International Conf., ESWC, volume 12731 of
LNCS, Springer, 2021, pp. 143–159.
[12] P. Monnin, C. Raïssi, A. Napoli, A. Coulet, Discovering alignment relations with graph
convolutional networks: A biomedical case study, Semantic Web 13 (2022) 379–398.
[13] N. Hubert, P. Monnin, A. Brun, D. Monticolo, New strategies for learning knowledge
graph embeddings: The recommendation case, in: EKAW - 23rd International Conf. on
Knowledge Engineering and Knowledge Management, Springer, 2022, pp. 66–80.
[14] S. Yang, J. Tian, H. Zhang, J. Yan, H. He, Y. Jin, Transms: Knowledge graph embedding
for complex relations by multidirectional semantics, in: Proc. of the Twenty-Eighth
International Joint Conf. on Artificial Intelligence, IJCAI, 2019, pp. 1935–1942.
[15] P. Wang, J. Zhou, Y. Liu, X. Zhou, Transet: Knowledge graph embedding with entity types,</p>
      <p>Electronics 10 (2021) 1407.
[16] B. Kotnis, V. Nastase, Analysis of the impact of negative sampling on link prediction in
knowledge graphs, arXiv preprint 1708.06816 (2017).
[17] B. Yang, W. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and
inference in knowledge bases, in: 3rd International Conf. on Learning Representations,
ICLR, 2015.
[18] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple
link prediction, in: Proc. of the 33rd International Conf. on Machine Learning, ICML,
volume 48, 2016, pp. 2071–2080.
[19] G. Chowdhury, M. Srilakshmi, M. Chain, S. Sarkar, Neural factorization for ofer
recommendation using knowledge graph embeddings, in: Proc. of the SIGIR Workshop on
eCommerce, volume 2410, 2019.
[20] M. Nickel, V. Tresp, H. Kriegel, A three-way model for collective learning on
multirelational data, in: Proc. of the 28th International Conf. on Machine Learning, ICML, 2011,
pp. 809–816.
[21] R. Kadlec, O. Bajgar, J. Kleindienst, Knowledge base completion: Baselines strike back, in:
Proc. of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL, 2017, pp.
69–74.
[22] Z. Sun, Z. Deng, J. Nie, J. Tang, Rotate: Knowledge graph embedding by relational rotation
in complex space, in: 7th International Conf. on Learning Representations, ICLR, 2019.
[23] N. Hubert, A. Brun, D. Monticolo, New Ontology and Knowledge Graph for University</p>
      <p>Curriculum Recommendation, in: Proc. of the ISWC Posters &amp; Demo Track, 2022, pp. 1–5.
[24] K. Toutanova, D. Chen, Observed versus latent features for knowledge base and text
inference, in: Proc. of the 3rd Workshop on Continuous Vector Space Models and their
Compositionality, Association for Computational Linguistics, 2015, pp. 57–66.
[25] H. N. Tran, A. Takasu, Exploring scholarly data by semantic query on knowledge graph
embedding space, in: Digital Libraries for Open Knowledge - 23rd International Conf. on
Theory and Practice of Digital Libraries, TPDL, volume 11799, Springer, 2019, pp. 154–162.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>29</volume>
          (
          <year>2017</year>
          )
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Barbosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Matinata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding for link prediction: A comparative analysis</article-title>
          ,
          <source>ACM Transactions on Knowledge Discovery from Data</source>
          <volume>15</volume>
          (
          <year>2021</year>
          )
          <volume>14</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          :
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García-Durán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>in: Conf. on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Krompaß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <article-title>Type-constrained representation learning in knowledge graphs, in: The Semantic Web -</article-title>
          14th
          <source>International Semantic Web Conf. (ISWC)</source>
          , volume
          <volume>9366</volume>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>640</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Gad-Elrab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stepanova</surname>
          </string-name>
          ,
          <article-title>Improving knowledge graph embeddings with ontological reasoning</article-title>
          , in: The Semantic Web - International
          <source>Semantic Web Conf. ISWC</source>
          , volume
          <volume>12922</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>410</fpage>
          -
          <lpage>426</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Faerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vermue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <article-title>On the ambiguity of rank-based evaluation of entity alignment or link prediction methods</article-title>
          , arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>06914</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Hoyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Gyori</surname>
          </string-name>
          ,
          <article-title>A unified framework for rank-based evaluation metrics for link prediction in knowledge graphs</article-title>
          ,
          <source>arXiv preprint arXiv:2203.07544</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          , I. Bansal,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Rivero</surname>
          </string-name>
          ,
          <article-title>Revisiting the evaluation protocol of knowledge graph completion methods for link prediction</article-title>
          ,
          <source>in: WWW '21: The Web Conf., ACM / IW3C2</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>809</fpage>
          -
          <lpage>820</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rufinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gemulla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Broscheit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meilicke</surname>
          </string-name>
          ,
          <article-title>On evaluating embedding models for knowledge base completion</article-title>
          ,
          <source>in: Proc. of the 4th Workshop on Representation Learning for NLP, RepL4NLP@ACL</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Make embeddings semantic again!</article-title>
          ,
          <source>in: Proc. of the ISWC Posters &amp; Demon-</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>