<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hung Nghiep Tran</string-name>
          <email>nghiepth@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Knowledge Graph, Knowledge Graph Completion, Knowledge</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atsuhiro Takasu</string-name>
          <email>takasu@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graph Embedding</institution>
          ,
          <addr-line>Multi-Embedding, Representation Learning.</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Informatics</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SOKENDAI, (The Graduate University for Advanced Studies)</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graph is a popular format for representing knowledge, with many applications to semantic search engines, questionanswering systems, and recommender systems. Real-world knowledge graphs are usually incomplete, so knowledge graph embedding methods, such as Canonical decomposition/Parallel factorization (CP), DistMult, and ComplEx, have been proposed to address this issue. These methods represent entities and relations as embedding vectors in semantic space and predict the links between them. The embedding vectors themselves contain rich semantic information and can be used in other applications such as data analysis. However, mechanisms in these models and the embedding vectors themselves vary greatly, making it dificult to understand and compare them. Given this lack of understanding, we risk using them inefectively or incorrectly, particularly for complicated models, such as CP, with two role-based embedding vectors, or the state-of-the-art ComplEx model, with complex-valued embedding vectors. In this paper, we propose a multi-embedding interaction mechanism as a new approach to uniting and generalizing these models. We derive them theoretically via this mechanism and provide empirical analyses and comparisons between them. We also propose a new multiembedding model based on quaternion algebra and show that it achieves promising results using popular benchmarks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Knowledge graphs provide a unified format for representing
knowledge about relationships between entities. A knowledge
graph is a collection of triples, with each triple (h, t, r ) denoting
the fact that relation r exists between head entity h and tail
entity t . Many large real-world knowledge graphs have been built,
including WordNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] representing English lexical knowledge,
and Freebase [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Wikidata [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] representing general
knowledge. Moreover, knowledge graph can be used as a universal
format for data from applied domains. For example, a
knowledge graph for recommender systems would have triples such as
(UserA, Item1, review) and (UserB, Item2, like).
      </p>
      <p>Knowledge graphs are the cornerstones of modern semantic
web technology. They have been used by large companies such as
First International Workshop on Data Science for Industry 4.0.</p>
      <p>
        Copyright ©2019 for the individual papers by the papers’ authors. Copying
permitted for private and academic purposes. This volume is published and copyrighted
by its editors.
Google to provide semantic meanings into many traditional
applications, such as semantic search engines, semantic browsing, and
question answering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. One important application of knowledge
graphs is recommender systems, where they are used to unite
multiple sources of data and incorporate external knowledge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
[
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. Recently, specific methods such as knowledge graph
embedding have been used to predict user interactions and provide
recommendations directly [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Real-world knowledge graphs are usually incomplete. For
example, Freebase and Wikidata are very large but they do not
contain all knowledge. This is especially true for the knowledge
graphs used in recommender systems. During system operation,
users review new items or like new items, generating new triples
for the knowledge graph, which is therefore inherently
incomplete. Knowledge graph completion, or link prediction, is the task
that aims to predict new triples.</p>
      <p>
        This task can be undertaken by using knowledge graph
embedding methods, which represent entities and relations as
embedding vectors in semantic space, then model the interactions
between these embedding vectors to compute matching scores
that predict the validity of each triple. Knowledge graph
embedding methods are not only used for knowledge graph completion,
but the learned embedding vectors of entities and relations are
also very useful. They contain rich semantic information similar
to word embeddings [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], enabling them to be used
in visualization or browsing for data analysis. They can also be
used as extracted or pretrained feature vectors in other learning
models for tasks such as classification, clustering, and ranking.
      </p>
      <p>
        Among the many proposed knowledge graph embedding
methods, the most eficient and efective involve
trilinear-productbased models, such as Canonical decomposition/Parallel
factorization (CP) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], DistMult [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], or the state-of-the-art
ComplEx model [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. These models solve a tensor decomposition
problem with the matching score of each triple modeled as the
result of a trilinear product, i.e., a multilinear map with three
variables corresponding to the embedding vectors h, t , and r
of head entity h, tail entity t , and relation r , respectively. The
trilinear-product-based score function for the three embedding
vectors is denoted as ⟨h, t, r ⟩ and will be defined mathematically
in Section 2.
      </p>
      <p>
        However, the implementations of embedding vectors for the
various models are very diverse. DistMult [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] uses one
realvalued embedding vector for each entity or relation. The original
CP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] uses one real-valued embedding vector for each relation,
but two real-valued embedding vectors for each entity when it is
as head and as tail, respectively. ComplEx [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] uses one
complexvalued embedding vector for each entity or relation. Moreover, a
recent heuristic for CP [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], here denoted as CPh , was proposed
to augment the training data, helping CP achieve results
competitive with the state-of-the-art model ComplEx. This heuristic
introduces an additional embedding vector for each relation, but
the underlying mechanism is diferent from that in ComplEx. All
of these complications make it dificult to understand and
compare the various models, to know how to use them and extend
them. If we were to use the embedding vectors for data analysis
or as pretrained feature vectors, a good understanding would
afect the way we would use the complex-valued embedding
vectors from ComplEx or the diferent embedding vectors for head
and tail roles from CP.
      </p>
      <p>In this paper, we propose a multi-embedding interaction
mechanism as a new approach to uniting and generalizing the above
models. In the proposed mechanism, each entity e is represented
by multiple embedding vectors {e(1), e(2), . . . } and each relation
r is represented by multiple embedding vectors {r (1), r (2), . . . }.
In a triple (h, t, r ), all embedding vectors of h, t , and r interact
with each other by trilinear products to produce multiple
interaction scores. These scores are then weighted summed by a weight
vector ω to produce the final matching score for the triple. We
show that the above models are special cases of this mechanism.
Therefore, it unifies those models and lets us compare them
directly. The mechanism also enables us to develop new models by
extending to additional embedding vectors.</p>
      <p>In this paper, our contributions include the following.
• We introduce a multi-embedding interaction mechanism
as a new approach to unifying and generalizing a class of
state-of-the-art knowledge graph embedding models.
• We derive each of the above models theoretically via this
mechanism. We then empirically analyze and compare
these models with each other and with variants.
• We propose a new multi-embedding model by an
extension to four-embedding vectors based on quaternion
algebra, which is an extension of complex algebra. We show
that this model achieves promising results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Knowledge graph embedding methods for link prediction are
actively being researched [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. Here, we only review the works
that are directly related to this paper, namely models that use only
triples, not external data such as text [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] or graph structure such
as relation paths [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Models using only triples are relatively
simple and they are also the current state of the art.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>General architecture</title>
      <p>
        Knowledge graph embedding models take a triple of the form
(h, t, r ) as input and output the validity of that triple. A general
model can be viewed as a three-component architecture:
(1) Embedding lookup: linear mapping from one-hot vectors
to embedding vectors. A one-hot vector is a sparse
discrete vector representing a discrete input, e.g., the first
entity could be represented as [1, 0, . . . , 0]⊤. A triple could
be represented as a tuple of three one-hot vectors
representing h, t , and r , respectively. An embedding vector is
a dense continuous vector of much lower dimensionality
than a one-hot vector thus lead to eficient distributed
representations [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
(2) Interaction mechanism: modeling the interaction between
embedding vectors to compute the matching score of a
triple. This is the main component of a model.
(3) Prediction: using the matching score to predict the validity
of each triple. A higher score means that the triple is more
likely to be valid.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Categorization</title>
      <p>Based on the modeling of the second component, a knowledge
graph embedding model falls into one of three categories, namely
translation-based, neural-network-based, or trilinear-product-based,
as described below.</p>
      <p>2.2.1 Translation-based: These models translate the head
entity embedding by summing with the relation embedding vector,
then measuring the distance between the translated images of
head entity and the tail entity embedding, usually by L1 or L2
distance:</p>
      <p>S(h, t, r ) = − ||h + r − t ||p</p>
      <p>D
Õ
d
= −
|hd + rd − td |p
!1/p
where
• h, t, r are embedding vectors of h, t , and r , respectively,
• p is 1 or 2 for L1 or L2 distance, respectively,
• D is the embedding size and d is each dimension.</p>
      <p>
        TransE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] was the first model of this type, with score function
basically the same as the above equation. There have been many
extensions such as TransR [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], TransH [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], and TransA [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
Most extensions are done by linear transformation of the entities
into a relation-specific space before translation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        These models are simple and eficient. However, their
modeling capacity is generally weak because of over-strong
assumptions about translation using relation embedding. Therefore, they
are unable to model some forms of data [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>2.2.2 Neural-network-based: These models use a nonlinear
neural network to compute the matching score for a triple:</p>
      <p>S(h, t, r ) =N N (h, t, r ),
where
• h, t, r are the embedding vectors of h, t , and r , respectively,
• N N is the neural network used to compute the score.</p>
      <p>
        One of the simplest neural-network-based model is ER-MLP
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which concatenates the input embedding vectors and uses a
multi-layer perceptron neural network to compute the matching
score. NTN [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] is an earlier model that employs nonlinear
activation functions to generalize the linear model RESCAL [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
Recent models such as ConvE [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] use convolution networks
instead of fully-connected networks.
      </p>
      <p>These models are complicated because of their use of neural
networks as a black-box universal approximator, which usually
make them dificult to understand and expensive to use.</p>
      <p>2.2.3 Trilinear-product-based: These models compute their
scores by using trilinear product between head, tail, and relation
embeddings, with relation embedding playing the role of
matching weights on the dimensions of head and tail embeddings:</p>
      <p>S(h, t, r ) =⟨h, t, r ⟩
where
=h⊤diaд(r )t</p>
      <p>D
= Õ
d=1</p>
      <p>D
= Õ
d=1
(h ⊙ t ⊙ r )d
(hd td rd ) ,
(1)
(2)
(3)
• h, t, r are embedding vectors of h, t , and r , respectively,
• diaд(r ) is the diagonal matrix of r ,
• ⊙ denotes the element-wise Hadamard product,
• D is the embedding size and d is the dimension for which
hd , td , and rd are the entries.</p>
      <p>
        In this paper, we focus on this category, particularly on
DistMult, ComplEx, CP, and CPh with augmented data. These models
are simple, eficient, and can scale linearly with respect to
embedding size in both time and space. They are also very efective,
as has been shown by the state-of-the-art results for ComplEx
and CPh using popular benchmarks [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        DistMult [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] embeds each entity and relation as a single
realvalued vector. DistMult is the simplest model in this category.
Its score function is symmetric, with the same scores for triples
(h, t, r ) and (t, h, r ). Therefore, it cannot model asymmetric data
for which only one direction is valid, e.g., asymmetric triples
such as (Paper1, Paper2, cite). Its score function is:
      </p>
      <p>S(h, t, r ) =⟨h, t, r ⟩,
where h, t, r ∈ Rk .</p>
      <p>
        ComplEx [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] is an extension of DistMult that uses
complexvalued embedding vectors that contain complex numbers. Each
complex number c with two components, real a and imaginary
b, can be denoted as c = a + bi. The complex conjugate c of c is
c = a − bi. The complex conjugate vector t of t is form from the
complex conjugate of the individual entries. Complex algebra
requires using the complex conjugate vector of tail embedding
in the inner product and trilinear product [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus, these
products can be antisymmetric, which enables ComplEx to model
asymmetric data [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Its score function is:
      </p>
      <p>S(h, t, r ) =Re(⟨h, t, r ⟩),
where h, t, r ∈ Ck and Re(c) means taking the real component
of the complex number c.</p>
      <p>
        CP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is similar to DistMult but embeds entities as head
and as tail diferently. Each entity e has two embedding vectors
e and e(2) depending on its role in a triple as head or as tail,
respectively. Using diferent role-based embedding vectors leads
to an asymmetric score function, enabling CP to also model
asymmetric data. However, experiments have shown that CP’s
performance is very poor on unseen test data [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Its score
function is:
      </p>
      <p>S(h, t, r ) =⟨h, t (2), r ⟩,
where h, t (2), r ∈ Rk .</p>
      <p>
        CPh [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is a direct extension of CP. Its heuristic augments
the training data by making an inverse triple (t, h, r (a)) for each
existing triple (h, t, r ), where r (a) is the augmented relation
corresponding to r . With this heuristic, CPh significantly improves CP,
achieving results competitive with ComplEx. Its score function
is:
      </p>
      <p>S(h, t, r ) =⟨h, t (2), r ⟩ and ⟨t, h(2), r (a)⟩,
where h, h(2), t, t (2), r, r (a) ∈ Rk .</p>
      <p>In the next section, we present a new approach to analyzing
these trilinear-product-based models.
3</p>
    </sec>
    <sec id="sec-5">
      <title>MULTI-EMBEDDING INTERACTION</title>
      <p>In this section, we first formally present the multi-embedding
interaction mechanism. We then derive each of the above
trilinearproduct-based models using this mechanism, by changing the
(4)
(5)
(6)
(7)
embedding vectors and setting appropriate weight vectors. Next,
we specify our attempt at learning weight vectors automatically.
We also propose a four-embedding interaction model based on
quaternion algebra.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Multi-embedding interaction mechanism</title>
      <p>We globally model each entity e as the multiple embedding
vectors {e(1), e(2), . . . , e(n)} and each relation r as the multiple
embedding vectors {r (1), r (2), . . . , r (n)}. The triple (h, t, r ) is
therefore modeled by multiple embeddings as h(i), t (j), r (k), i, j, k ∈
{1, ..., n}.</p>
      <p>In each triple, the embedding vectors for head, tail, and
relation interact with each and every other embedding vector to
produce multiple interaction scores. Each interaction is modeled
by the trilinear product of corresponding embedding vectors. The
interaction scores are then weighted summed by a weight vector:
S(h, t, r ; Θ, ω) =</p>
      <p>Õ
ω(i,j,k) ⟨h(i), t (j), r (k)⟩,
(8)
where
• Θ is the parameter denoting embedding vectors h(i), t (j), r (k),
• ω is the parameter denoting the weight vector used to
combine the interaction scores, with ω(i,j,k) being an
element of ω.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Deriving trilinear-product-based models</title>
      <p>The existing trilinear-product-based models can be derived from
the proposed general multi-embedding interaction score function
in Eq. (8) by setting the weight vector ω as shown in Table 1.</p>
      <p>
        For DistMult, we can see the equivalence directly. For
ComplEx, we need to expand its score function following complex
algebra [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
which is equivalent to the weighted sum using the weight vectors
in Table 1. Note that by the symmetry between h and t , we can
also obtain the equivalent weight vector ComplEx equiv. 1.
By symmetry between embedding vectors of the same entity
or relation, we can also obtain the equivalent weight vectors
ComplEx equiv. 2 and ComplEx equiv. 3.
      </p>
      <p>For CP, note that the two role-based embedding vectors for
each entity can be mapped to two-embedding vectors in our
model and the relation embedding vector can be mapped to r (1).
For CPh , further note that its data augmentation is equivalent
to adding the score of the original triple and the inverse triple
We can then map r (a) to r (2) to obtain the equivalence given in
Table 1. By symmetry between h and t , we can also obtain the
equivalent weight vector CPh equiv. 1.</p>
      <p>From this perspective, all four models DistMult, ComplEx,
CP, and CPh can be seen as special cases of the general
multiembedding interaction mechanism. This provides an intuitive
perspective on using the embedding vectors in complicated
models. For the ComplEx model, instead of using a complex-valued
embedding vector, we can treat it as two real-valued
embedding vectors. These vectors can then be used directly in common
learning algorithms that take as input real-valued vectors rather
than complex-valued vectors. We also see that multiple
embedding vectors are a natural extension of single embedding vectors.
Given this insight, multiple embedding vectors can be
concatenated to form a longer vector for use in visualization and data
analysis, for example.
3.3</p>
    </sec>
    <sec id="sec-8">
      <title>Automatically learning weight vectors</title>
      <p>As we have noted, the weight vector ω plays an important role
in the model, because it determines how the interaction
mechanism is implemented and therefore how the specific model can
be derived. An interesting question is how to learn ω
automatically. One approach is to let the model learn ω together with
the embeddings in an end-to-end fashion. For a more detailed
examination of this idea, we will test diferent restrictions on the
range of ω by applying tanh(ω), sigmoid(ω), and softmax(ω).</p>
      <p>Note also that the weight vectors for related models are usually
sparse. We therefore enforce a sparsity constraint on ω by an
additional Dirichlet negative log-likelihood regularization loss:
Ldir = −λdir</p>
      <p>Õ
where α is a hyperparameter controlling sparseness (a small α
will make the weight vector sparser) and λdir is the regularization
strength.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Quaternion-based four-embedding interaction model</title>
      <p>Another question is whether using more embedding vectors in
the multi-embedding interaction mechanism is helpful.
Motivated by the derivation of ComplEx from a two-embedding
interaction model, we develop a four-embedding interaction model
by using quaternion algebra to determine the weight vector and
the interaction mechanism.</p>
      <p>
        Quaternion numbers are extension of complex numbers to
four components [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Each quaternion number q, with one
real component a and three imaginary components b, c, d, could
be written as q = a + bi + c j + dk where i, j, k are fundamental
quaternion units, similar to the imaginary number i in complex
algebra. As for complex conjugates, we also have a quaternion
conjugate q = a − bi − c j − dk.
      </p>
      <p>
        An intuitive view of quaternion algebra is that each
quaternion number represents a 4-dimensional vector (or 3-dimensional
vector when the real component a = 0) and quaternion
multiplication is rotation of this vector in 4- (or 3-)dimensional space.
Compared to complex algebra, each complex number represents
a 2-dimensional vector and complex multiplication is rotation of
this vector in 2-dimensional plane [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Several works have shown the benefit of using complex,
quaternion, or other hyper-complex numbers in the hidden layers of
deep neural networks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. To the best of our knowledge,
this paper is the first to motivate and use quaternion numbers
for the embedding vectors of knowledge graph embedding.
      </p>
      <p>Quaternion multiplication is noncommutative, thus there are
multiple ways to multiply three quaternion numbers in the
trilinear product. Here, we choose to write the score function of
the quaternion-based four-embedding interaction model as:</p>
      <p>S(h, t, r ) =Re(⟨h, t, r ⟩),
where h, t, r ∈ Hk .</p>
      <p>
        By expanding this formula using quaternion algebra [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and
mapping the four components of a quaternion number to four
embeddings in the multi-embedding interaction model,
respectively, we can write the score function in the notation of the
multi-embedding interaction model as:
where D is true data (pˆ = 1), D ′ is negative sampled data (pˆ = 0),
and pˆ is the empirical probability.
      </p>
      <p>Defining the class label Y(h,t ,r ) = 2pˆ(h,t ,r ) − 1, i.e., the labels
of positive triples are 1 and negative triples are −1, the above loss
can be written more concisely. In cluding the L2 regularization
of embedding vectors, this loss can be written as:
L(D, D ′; Θ, ω) =
log(1 + e−Y(h,t ,r ) S(h,t ,r ;Θ,ω))</p>
    </sec>
    <sec id="sec-10">
      <title>LOSS FUNCTION AND OPTIMIZATION</title>
      <p>
        The learning problem in knowledge graph embedding methods
can be modeled as the binary classification of valid and invalid
triples. Because knowledge graphs do not contain invalid triples,
we generate them by negative sampling [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For each valid triple
(h, t, r ), we replace the h or t entities in each training triple with
other random entities to obtain the invalid triples (h′, t, r ) and
(h, t ′, r ) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We can then learn the model parameters by minimizing the
negative log-likelihood loss for the training data with the
predicted probability modeled by the logistic sigmoid function σ (·)
on the matching score. This loss is the cross-entropy:
Õ
L(D, D ′; Θ, ω) = −
log σ (S(h, t, r ; Θ, ω))
where D is true data, D ′ is negative sampled data, Θ are the
embedding vectors corresponding to specific current triples, n is
the number of multi-embedding, D is the embedding size, and λ
is the regularization strength.
5
5.1</p>
    </sec>
    <sec id="sec-11">
      <title>EXPERIMENTAL SETTINGS</title>
    </sec>
    <sec id="sec-12">
      <title>Datasets</title>
      <p>
        For our empirical analysis, we used the WN18 dataset, the most
popular of the benchmark datasets built on WordNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] by
Bordes et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This dataset has 40,943 entities, 18 relations,
141,442 training triples, 5,000 validation triples, 5,000 test triples.
In our preliminary experiments, the relative performance on
all datasets was quite consistent, therefore choosing the WN18
dataset is appropriate for our analysis. We will consider the use
of other datasets in in future work.
5.2
      </p>
    </sec>
    <sec id="sec-13">
      <title>Evaluation protocols</title>
      <p>
        Knowledge graph embedding methods are usually evaluated on
link prediction task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this task, for each true triple (h, t, r ) in
the test set, we replace h and t by every other entity to generate
corrupted triples (h′, t, r ) and (h, t ′, r ), respectively [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The goal
of the model now is to rank the true triple (h, t, r ) before the
corrupted triples based on the predicted score S.
      </p>
      <p>
        For each true triple in the test set, we compute its rank, then we
can compute popular evaluation metrics including MRR (mean
reciprocal rank) and Hit @k for k ∈ {1, 3, 10} (how many true
triples are correctly ranked in the top k) [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        To avoid false negative error, i.e., corrupted triples are
accidentally valid triples, we follow the protocols used in other works
for filtered metrics [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this protocol, all valid triples in the
(15)
(16)
training, validation, and test sets are removed from the corrupted
triples set before computing the rank of the true triple.
5.3
      </p>
    </sec>
    <sec id="sec-14">
      <title>Training</title>
      <p>
        We trained the models using SGD with learning rates auto-tuned
by Adam [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], that makes the choice of initial learning rate more
robust. For all models, we found good hyperparameters with
grid search on learning rates ∈ {10−3, 10−4}, embedding
regularization strengths ∈ {10−2, 3 × 10−3, 10−3, 3 × 10−4, 10−4, 0.0},
and batch sizes ∈ {212, 214}. For a fair comparison, we fixed
the embedding sizes so that numbers of parameters for all
models are comparable. In particular, we use embedding sizes of
400 for one-embedding models such as DistMult, 200 for
twoembedding models such as ComplEx, CP, and CPh , and 100 for
four-embedding models. We also fixed the number of negative
samples at 1 because, although using more negative samples
is beneficial for all models, it is also more expensive and not
necessary for this comparative analysis.
      </p>
      <p>We constrained entity embedding vectors to have unit L2-norm
after each training iteration. All training runs were stopped early
by checking the filtered MRR on the validation set after every 50
epochs, with 100 epochs patient.
6</p>
    </sec>
    <sec id="sec-15">
      <title>RESULTS AND DISCUSSION</title>
      <p>In this section, we present experimental results and analyses for
the models described in Section 3. We report results for derived
weight vectors and their variants, auto-learned weight vectors,
and the quaternion-based four-embedding interaction model.
6.1</p>
      <p>
        Derived weight vectors and variants
6.1.1 Comparison of derived weight vectors . We evaluated the
multi-embedding interaction model with the score function in Eq.
(8), using the derived weight vectors in Table 1. The results are
shown in Table 2. They are consistent with the results reported
in other works [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Note that ComplEx and CPh achieved good
results, whereas DistMult performed less well. CP performed
very poorly in comparison to the other models, even though it is
a classical model for the tensor decomposition task [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>For a more detailed comparison, we report the performance on
training data. Note that ComplEx and CPh can accurately predict
the training data, whereas DistMult did not. This is evidence that
ComplEx and CPh are fully expressive while DistMult cannot
model asymmetric data efectively.</p>
      <p>The most surprising result was that CP can also accurately
predict the training data at a comparable level to ComplEx and
CPh , despite its very poor result on the test data. This suggests
that the problem with CP is not its modeling capacity, but in its
generalization performance to new test data. In other words, CP
is severely overfitting to the training data. However, standard
regularization techniques such as L2 regularization did not appear
to help. CPh can be seen as a regularization technique that does
help CP generalize well to unseen data.</p>
      <p>6.1.2 Comparison with other variants of weight vectors. In
Table 2, we show the results for two bad examples and two good
examples of weight vector variants. Note that bad example 1
performed similarly to CP and bad example 2 performed similarly
to DistMult. Good example 1 was similar to CPh and good example
2 was similar to ComplEx.</p>
      <p>This shows that the problem of bad weight vectors is not
unique to some specific models. Moreover, it shows that there</p>
      <p>• Completeness: all embedding vectors in a triple should be
involved in the weighted-sum matching score.
• Stability: all embedding vectors for the same entity or
relation should contribute equally to the weighted-sum
matching score.
• Distinguishability: the weighted-sum matching scores for
diferent triples should be distinguishable. For example, the
score ⟨h(1), t (2), r (1)⟩ + ⟨h(2), t (1), r (2)⟩ is indistinguishable
because switching h and t forms a symmetric group.</p>
      <p>
        As an example, consider the ComplEx model, where the
multiplication of two complex numbers written in polar coordinate
format, c1 = |c1 |e−iθ1 and c2 = |c2 |e−iθ2 , can be written as
c1c2 = |c1 ||c2 |e−i(θ1+θ2) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is a rotation in the complex
plane, which intuitively satisfies the above properties.
6.2
      </p>
    </sec>
    <sec id="sec-16">
      <title>Automatically learned weight vectors</title>
      <p>We let the models learn ω together with the embeddings in an
end-to-end fashion, aiming to learn good weight vectors
automatically. The results are shown in Table 3.</p>
      <p>We first set uniform weight vector as a baseline. The results
were similar to those for DistMult because the weighted-sum
matching score is also symmetric. However, other
automatically learned weight vectors also performed similarly to
DistMult. Diferent restrictions by applying tanh(ω), sigmoid(ω),
and softmax(ω) did not help. We noticed that the learned weight
vectors were almost uniform, making them indistinguishable,
suggesting that the use of sparse weight vectors might help.</p>
      <p>We enforced a sparsity constraint by an additional Dirichlet
negative log-likelihood regularization loss on ω, with α tuned
to 116 and λdir tuned to 10−2. However, the results did not
improve. Tracking of weight vectors value showed that the sparsity
constraint seemed to amplify the initial diferences between the
weight values instead of learning useful sparseness. This suggests
that the gradient information is too symmetric that the model
cannot break the symmetry of ω and escape the local optima.</p>
      <p>In general, these experiments show that learning good weight
vectors automatically is a particularly dificult task.
6.3</p>
    </sec>
    <sec id="sec-17">
      <title>Quaternion-based four-embedding interaction model</title>
      <p>In Table 4, we present the evaluation results for the proposed
quaternion-based four-embedding interaction model. The results
were generally positive, with most metrics higher than those in
Table 2 for state-of-the-art models such as ComplEx and CPh .
Especially, H@10 performance was much better than other models.</p>
      <p>Note that this model needs more extensive evaluation. One
potential problem is its being prone to overfitting, as seen in the
on train results, with H@10 at absolute 1.000. This might mean
that better regularization methods may be needed. However, the
general results suggest that extending to more embedding vectors
for multi-embedding interaction models is a promising approach.
7</p>
    </sec>
    <sec id="sec-18">
      <title>CONCLUSION</title>
      <p>This paper proposes a multi-embedding interaction mechanism
as a new approach to analyzing state-of-the-art knowledge graph
embedding models such as DistMult, ComplEx, CP, and CPh . We
show that these models can be unified and generalized under
the new approach to provide an intuitive perspective on using
the models and their embedding vectors efectively. We analyzed
and compared the models and their variants empirically to better
understand their properties, such as the severe overfitting
problem of the CP model. In addition, we propose and have evaluated
a new multi-embedding interaction model based on quaternion
algebra, which showed some promising results.</p>
      <p>There are several promising future directions. One direction
is to find new methods of modeling the interaction mechanism
between multi-embedding vectors and the efective extension to
additional embedding vectors. Another direction is to evaluate
multi-embedding models such as the proposed quaternion-based
four-embedding interaction model more extensively.</p>
    </sec>
    <sec id="sec-19">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by a JSPS Grant-in-Aid for Scientific
Research (B) (15H02789).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Lars</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Ahlfors</surname>
          </string-name>
          .
          <year>1953</year>
          .
          <article-title>Complex Analysis: An Introduction to the Theory of Analytic Functions of One Complex Variable</article-title>
          . New York, London (
          <year>1953</year>
          ),
          <fpage>177</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Amit</given-names>
            <surname>Singhal</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Oficial Google Blog: Introducing the Knowledge Graph: Things, Not Strings</article-title>
          . https://googleblog.blogspot.com/
          <year>2012</year>
          /05/introducingknowledge-graph
          <article-title>-things-not</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Bollacker</surname>
          </string-name>
          , Colin Evans, Praveen Paritosh, Tim Sturge, and
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge</article-title>
          . In In SIGMOD Conference.
          <volume>1247</volume>
          -
          <fpage>1250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Nicolas Usunier, Alberto Garcia-Duran,
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Oksana</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Translating Embeddings for Modeling Multi-Relational Data</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>2787</volume>
          -
          <fpage>2795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Walter</given-names>
            <surname>Carrer-Neto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>María</given-names>
            <surname>Luisa</surname>
          </string-name>
          Hernández-Alcaraz, Rafael Valencia-García, and Francisco García-Sánchez.
          <year>2012</year>
          .
          <article-title>Social Knowledge-Based Recommender System</article-title>
          .
          <article-title>Application to the Movies Domain</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>39</volume>
          ,
          <issue>12</issue>
          (Sept.
          <year>2012</year>
          ),
          <fpage>10990</fpage>
          -
          <lpage>11000</lpage>
          . https://doi.org/10.1016/j.eswa.
          <year>2012</year>
          .
          <volume>03</volume>
          .025
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Dettmers</surname>
          </string-name>
          , Pasquale Minervini, Pontus Stenetorp, and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Riedel</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Convolutional 2d Knowledge Graph Embeddings</article-title>
          .
          <source>In In Thirty-Second AAAI Conference on Artificial Intelligence .</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Xin</given-names>
            <surname>Dong</surname>
          </string-name>
          , Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun,
          <string-name>
            <given-names>and Wei</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '14</source>
          . ACM Press, New York, New York, USA,
          <fpage>601</fpage>
          -
          <lpage>610</lpage>
          . https://doi.org/10.1145/2623330.2623623
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ron</given-names>
            <surname>Goldman</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <string-name>
            <given-names>Rethinking</given-names>
            <surname>Quaternions</surname>
          </string-name>
          .
          <source>Synthesis Lectures on Computer Graphics and Animation</source>
          <volume>4</volume>
          ,
          <issue>1</issue>
          (Oct.
          <year>2010</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>157</lpage>
          . https://doi.org/10.2200/ S00292ED1V01Y201008CGR013
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Nitzan</given-names>
            <surname>Guberman</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>On Complex Valued Convolutional Neural Networks</article-title>
          .
          <source>arXiv:1602.09046 [cs.NE] (Feb</source>
          .
          <year>2016</year>
          ).
          <source>arXiv:cs.NE/1602.09046</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ruining</surname>
            <given-names>He</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang-Cheng Kang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Julian McAuley</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>TranslationBased Recommendation</article-title>
          .
          <source>In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys '17)</source>
          . ACM, New York, NY, USA,
          <fpage>161</fpage>
          -
          <lpage>169</lpage>
          . https://doi.org/10.1145/3109859.3109882
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Geofrey</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>Learning Distributed Representations of Concepts</article-title>
          .
          <source>In Proceedings of the Eighth Annual Conference of the Cognitive Science Society</source>
          , Vol.
          <volume>1</volume>
          . Amherst, MA,
          <volume>12</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G E</given-names>
            <surname>Hinton</surname>
            , J L McClelland
          </string-name>
          , and
          <string-name>
            <given-names>D E</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          .
          <year>1984</year>
          .
          <article-title>Distributed Representations</article-title>
          .
          <source>In Parallel Distributed Processing</source>
          . Carnegie-Mellon University, Pittsburgh, PA,
          <volume>33</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Frank</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hitchcock</surname>
          </string-name>
          .
          <year>1927</year>
          .
          <article-title>The Expression of a Tensor or a Polyadic as a Sum of Products</article-title>
          .
          <source>Journal of Mathematics and Physics 6</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          (
          <year>April 1927</year>
          ),
          <fpage>164</fpage>
          -
          <lpage>189</lpage>
          . https://doi.org/10.1002/sapm192761164
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[15] Isai Lvovich Kantor and Aleksandr Samuilovich Solodovnikov</source>
          .
          <year>1989</year>
          .
          <article-title>Hypercomplex Numbers: An Elementary Introduction to Algebras</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
            and
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>In Proceedings of the 3rd International Conference on Learning Representations (ICLR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Timothée</surname>
            <given-names>Lacroix</given-names>
          </string-name>
          , Nicolas Usunier, and
          <string-name>
            <given-names>Guillaume</given-names>
            <surname>Obozinski</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Canonical Tensor Decomposition for Knowledge Base Completion</article-title>
          .
          <source>In Proceedings of the 35th International Conference on Machine Learning (ICML'18).</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Yankai</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Zhiyuan Liu, Huanbo Luan, Maosong Sun,
          <string-name>
            <given-names>Siwei</given-names>
            <surname>Rao</surname>
          </string-name>
          , and Song Liu.
          <year>2015</year>
          .
          <article-title>Modeling Relation Paths for Representation Learning of Knowledge Bases</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Yankai</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Zhiyuan Liu, Maosong Sun, Yang Liu, and
          <string-name>
            <given-names>Xuan</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning Entity and Relation Embeddings for Knowledge Graph Completion</article-title>
          .
          <source>In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</source>
          .
          <fpage>2181</fpage>
          -
          <lpage>2187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Eficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>In ICLR'13 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S. Corrado, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>George A.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>WordNet: A Lexical Database for English</article-title>
          .
          <source>Commun. ACM</source>
          (
          <year>1995</year>
          ),
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Toshifumi</surname>
            <given-names>Minemoto</given-names>
          </string-name>
          , Teijiro Isokawa, Haruhiko Nishimura, and
          <string-name>
            <given-names>Nobuyuki</given-names>
            <surname>Matsui</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Feed Forward Neural Network with Random Quaternionic Neurons</article-title>
          . Signal
          <string-name>
            <surname>Processing</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <volume>136</volume>
          (
          <year>2017</year>
          ),
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          . https://doi.org/10.1016/j. sigpro.
          <year>2016</year>
          .
          <volume>11</volume>
          .008
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Maximilian</surname>
            <given-names>Nickel</given-names>
          </string-name>
          , Volker Tresp, and
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A Three-Way Model for Collective Learning on Multi-Relational Data</article-title>
          .
          <source>In Proceedings of the 28th International Conference on Machine Learning</source>
          .
          <fpage>809</fpage>
          -
          <lpage>816</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Titouan</surname>
            <given-names>Parcollet</given-names>
          </string-name>
          , Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Quaternion Recurrent Neural Networks</article-title>
          .
          <source>In Proceedings of the International Conference on Learning Representations (ICLR'19).</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Richard</surname>
            <given-names>Socher</given-names>
          </string-name>
          , Danqi Chen,
          <string-name>
            <surname>Christopher D Manning</surname>
          </string-name>
          , and Andrew Y Ng.
          <year>2013</year>
          .
          <article-title>Reasoning With Neural Tensor Networks for Knowledge Base Completion</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>926</volume>
          -
          <fpage>934</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Théo</surname>
            <given-names>Trouillon</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christopher R. Dance</surname>
            , Éric Gaussier, Johannes Welbl,
            <given-names>Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Riedel</surname>
            , and
            <given-names>Guillaume</given-names>
          </string-name>
          <string-name>
            <surname>Bouchard</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Knowledge Graph Completion via Complex Tensor Factorization</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>18</volume>
          ,
          <issue>1</issue>
          (
          <year>2017</year>
          ),
          <fpage>4735</fpage>
          -
          <lpage>4772</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Theo</surname>
            <given-names>Trouillon</given-names>
          </string-name>
          , Johannes Welbl,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Riedel</surname>
          </string-name>
          , Eric Gaussier, and
          <string-name>
            <given-names>Guillaume</given-names>
            <surname>Bouchard</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Complex Embeddings for Simple Link Prediction</article-title>
          .
          <source>In International Conference on Machine Learning (ICML'16)</source>
          .
          <fpage>2071</fpage>
          -
          <lpage>2080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Denny</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Wikidata: A Free Collaborative Knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          ,
          <issue>10</issue>
          (Sept.
          <year>2014</year>
          ),
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . https://doi.org/10. 1145/2629489
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Knowledge Graph Embedding: A Survey of Approaches and Applications</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>29</volume>
          ,
          <issue>12</issue>
          (Dec.
          <year>2017</year>
          ),
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          . https://doi.org/10.1109/ TKDE.
          <year>2017</year>
          .2754499
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Yanjie</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rainer Gemulla</surname>
            , and
            <given-names>Hui</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>On Multi-Relational Link Prediction with Bilinear Models</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Zhen</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Jianwen Zhang, Jianlin Feng, and
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Knowledge Graph and Text Jointly Embedding</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          .
          <fpage>1591</fpage>
          -
          <lpage>1601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Zhen</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Jianwen Zhang, Jianlin Feng, and
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Knowledge Graph Embedding by Translating on Hyperplanes</article-title>
          .
          <source>In AAAI Conference on Artificial Intelligence . Citeseer</source>
          ,
          <volume>1112</volume>
          -
          <fpage>1119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Han</surname>
            <given-names>Xiao</given-names>
          </string-name>
          , Minlie Huang,
          <string-name>
            <given-names>Yu</given-names>
            <surname>Hao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoyan</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>TransA: An Adaptive Approach for Knowledge Graph Embedding</article-title>
          .
          <source>In AAAI Conference on Artificial Intelligence. arXiv:1509.05490</source>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Bishan</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wen-tau Yih</surname>
          </string-name>
          , Xiaodong He,
          <string-name>
            <surname>Jianfeng Gao</surname>
            ,
            <given-names>and Li</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Embedding Entities and Relations for Learning and Inference in Knowledge Bases</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Fuzheng</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Nicholas Jing Yuan, Defu Lian, Xing Xie, and
          <string-name>
            <surname>Wei-Ying Ma</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Collaborative Knowledge Base Embedding for Recommender Systems</article-title>
          . ACM Press,
          <fpage>353</fpage>
          -
          <lpage>362</lpage>
          . https://doi.org/10.1145/2939672.2939673
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>