<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Asymmetric Transitive Relations in Knowledge Bases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pushpendre Rastogi</string-name>
          <email>pushpendre@jhu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Van Durme</string-name>
          <email>vandurme@cs</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Language and Speech Processing</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Human Language Technology Center of Excellence Johns Hopkins University</institution>
        </aff>
      </contrib-group>
      <fpage>7</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>Knowledge Base Completion (KBC), or link prediction, is the task of inferring missing edges in an existing knowledge graph. Although a number of methods have been evaluated empirically on select datasets for KBC, much less attention has been paid to understanding the relationship between the logical properties encoded by a given KB and the KBC method being evaluated. In this paper we study the effect of the logical properties of a relation on the performance of a KBC method, and we present a theorem and empirical results that can guide researchers in choosing the KBC algorithm for a KB.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Copyright c by the paper’s authors. Copying permitted for private and academic purposes.</p>
      <p>In: L. Dietz, C. Xiong, E. Meij (eds.): Proceedings of the First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR),
Tokyo, Japan, 11-Aug-2017, published at http://ceur-ws.org</p>
      <p>1It was reported by [Dong et al., 2014] in October 2013, that 71% of people in Freebase had no known place of birth and that 75% had no known
nationality.</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical Analysis</title>
      <p>Notation: A KB contains (subject, relation, object) triples. Each triple encodes the fact that a subject entity is related to an
object through a particular relation. Let V and R denote the set of entities and relationships. We use V to denote entities
to evoke the notion that an entity corresponds to a vertex in the knowledge graph. We assume that R includes a type for
the null relation or no relation. Let V = jVj and R = jRj denote the number of entities and relations. We use v and r to
denote a generic entity and relation respectively. The shorthand [n] denotes fxj1 x n; x 2 Ng. Let E denote the entire
collection of facts and let e denote a generic element of E . Each instance of e is an edge in the knowledge graph. We refer to
the subject, object and relation of e as esub; eobj 2 V and erel 2 R respectively. E = jE j is the number of known triples.</p>
      <p>RESCAL: The RESCAL model associates each entity v with the vector av 2 Rd and it represents the relation r through
the matrix Mr 2 Rd d. Let v and v0 denote two entities whose relationship is unknown, and let s(v; r; v0) = avT Mrav0 ,
then the RESCAL model predicts the relation between v and v0 to be: r^ = argmaxr2R s(v; r; v0). Note that in general
if the matrix Mr is asymmetric then the score function s would also be asymmetric, i.e., s(v; r; v0) 6= s(v0; r; v). Let
= favjv 2 Vg [ fMrjr 2 Rg.</p>
      <p>Transitive Relations and RESCAL: In addition to relational information about the binary connections between entities,
many KBs contain information about the relations themselves. For example, consider the toy knowledge base depicted
in Figure 2a. Based on the information that Fluffy is-a Dog and that a Dog is-a Animal and that is-a is a transitive
relations we can infer missing relations such as Fluffy is-a Animal.</p>
      <p>Let us now analyze what happens when we encode a transitive, asymmetric relation. Consider the situation where the
set R only contains two relations fr0; r1g. r1 denotes the presence of the is-a relation and r0 denotes the absence of that
relation. The embedding based model can only follow the chain of transitive relations and infer missing edges using existing
information in the graph if for all triples of vertices v; v0; v00 in V for which we have observed (v, is-a, v0) and (v0, is-a, v00)
the following holds true:</p>
      <p>s(v; r1; v0)&gt;s(v; r0; v0) and s(v0; r1; v00)&gt;s(v0; r0; v00) =) s(v; r1; v00)&gt;s(v; r0; v00)</p>
      <p>I.e. avT (Mr1 Mr0 )av0 &gt; 0 and avT0 (Mr1 Mr0 )av00 &gt; 0 =) avT (Mr1
We now define a transitive matrix and state a theorem that we prove in Section 6.</p>
      <p>Mr0 )av00 &gt; 0
(1)
Definition A matrix M 2 Rd d is transitive if aT M b &gt; 0 and bT M c &gt; 0 implies aT M c &gt; 0.</p>
      <sec id="sec-2-1">
        <title>Theorem 1. Every transitive matrix is symmetric.</title>
        <p>If we enforce the constraint in Equation 1 to hold for all possible vectors and not just a finite number of vectors then
Mr1 Mr0 is a transitive matrix. By Theorem 1, Mr1 Mr0 must be symmetric. This further implies that if s(v; r1; v0) &gt;
s(v; r0; v0) then s(v0; r1; v) &gt; s(v0; r0; v). In terms of the toy KB shown in Figure 2a; if the RESCAL model predicts that
Fluffy is-a Animal then it will also predict that Animal is-a Fluffy.</p>
        <p>Augmenting RESCAL to Encode Transitive Relations: The analysis above points to a simple way for improving
RESCAL’s performance on asymmetric, transitive relations. The reason that the original method fails to satisfactorily
encode transitive asymmetric relations is because if the score s(v; r1; v0) is high then s(v0; r1; v) will also be high. We
can avoid this situation by using two different embeddings for all the entities and compute the score of a relation through
those role specific embeddings; i.e. we can use the embeddings a1v; a2v to represent vertex v and let s(v; r1; v0) = a1vMr1 av0
2
and s(v0; r1; v) = a1v0 Mr1 av. This idea of using role specific embeddings has been known for a long time starting
2
from [Tucker, 1966]. 2 In fact the specific method that we have just explained is generally known to KBC researchers as
the Tucker2 decomposition [Singh et al., 2015]. In order to encode more than one relations, only the matrix Mr needs to
change but the entity embeddings can be shared across all relations.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>Due to the large body of work that has been done for the task of KBC it is not possible to cover all of the related work on
KBC in this section. Instead, we refer the reader to the survey [Nickel et al., 2016a] for an overview of the empirical work
that has been done in the area of KBC and link prediction.</p>
      <p>Since we focus on the analysis of RESCAL, our work is most closely related to the paper [Nickel et al., 2014]. This paper
proves an important theorem that shows that the dimensionality required by the RESCAL model3 for exactly representing
a weighted adjacency matrix of a knowledge graph must be greater than the number of strongly connected components
in the graph. In our setting where we consider data sets that contain only transitive-asymmetric relations, the number of
2Recently [Yoon et al., 2016] used this idea of using role specific embeddings to preserve the properties of symmetry and transitivity in translation
based knowledge base embeddings.</p>
      <p>3Actually their theorem provides a lower bound for a more general model than RESCAL which automatically applies to RESCAL.
strongly connected components in the graph equal the number of vertices in the graph. Therefore their theorem proves that
the dimensionality required for exactly representing a dataset such as WordNet using an algorithm such as RESCAL must be
greater than the number of entities in the knowledge graph. In contrast to this result, our analysis gives an explicit example
of a type of query for which the RESCAL algorithm will make wrong inferences.</p>
      <p>Our analysis trivially extends to a few other factorization based algorithms e.g. the Holographic embedding algorithm
by [Nickel et al., 2016b]. The holographic embedding method can be rewritten as a constrained form of RESCAL with a
“holographically constrained” matrix M . Figure 1 shows an example of a 3 3 holographically constrained matrix with the
constraint that elements with the same color must hold the same value. Since such a matrix is asymmetric by construction,
our theorem proves that there will exist vectors a; b, and c for which M will violate transitivity.</p>
      <p>m1
m3
m2
m2
m1
m3
m3
m2
m1</p>
      <p>Recently [Bouchard et al., 2015] argued that the phenomenon of transitivity of relations between vertices in a knowledge
graph can be modeled with high accuracy if the knowledge graph is modeled as a thresholded version of a latent low rank
real matrix, and the vertex embeddings are learnt as a low rank factorization of that latent matrix. Based on this argument
they claimed that factorizing a knowledge graph with a squared loss was less appropriate in comparison to factorizing it with
a hinge loss or logistic loss. In this work we provide an argument based on the symmetry of transitive matrices to show that
the method of RESCAL which minimizes the squared reconstruction error must fail to capture phenomenon like transitivity
in large knowledge bases. In this way, our results complement the work by [Bouchard et al., 2015].
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>Our theoretical result in Section 2 was derived under the assumption that the constraint 1 held over all vectors in Rd instead
of just the finite number of vector triples used to encode the KB triples. It is intuitive that as the number of entities inside a
KB increases our assumption will become an increasingly better approximation of reality. Therefore our theory predicts that
the performance of the RESCAL model will degrade as the number of entities inside the KB increases and the dimensionality
of the embeddings remains constant. We perform experiments to test this prediction of our theory.
4.1</p>
      <p>Experiments On Simulated Data
We tested the applicability of our analysis by the following experiments: We started with a complete, balanced, rooted,
directed binary tree T , with edges directed from the root to its children. We then augmented T as follows: For every tuple of
distinct vertices v, v0 we added a new edge to T if there already existed a directed path starting at v and ending at v0 in
T . We stopped when we could not add any more edges without creating multi-edges. For the rest of the paper we denote
this resulting set of ordered pairs of vertices as E and those pairs of vertices that are not in E as E c. For a tree of depth
11, V = 2047; E = 18; 434 and jE cj = 4; 171; 775. See Figure 2b for an example of E ; E c. We trained the RESCAL
model under two settings: In the first setting, called FullSet, we used entire E and E c for training. In the second setting,
called SubSet, we randomly sample E c and select only E = jE j edges from E c. All the edges in E including all the edges in
the original tree are always used during both FullSet and SubSet. For both the settings of FullSet and SubSet we trained
RESCAL 5 times and evaluated the models’ predictions on E , E c and E (rev). E ; E c have already been defined, and E (rev) is
the set of reversed ordered pairs in E . I.e., E rev = f(u; v)j(v; u) 2 E g</p>
      <p>For every edge in these three subsets we evaluated the model’s performance under 0 1 loss. Specifically, to evaluate the
performance of RESCAL on an edge (v; v0) 2 E we checked whether the model assigns a higher score to (v; r1; v0) than
(v; r0; v0) and rewarded the model by 1 point if it made the right prediction and 0 otherwise. As before, r1 and r0 denote the
presence and absence of relationship respectively.</p>
      <p>Note that low performance on E rev and high performance on E will indicate exactly the type of failure predicted from our
analysis. We vary the dimensionality of the embedding d, and the number of entities V, since they influence the performance
of the model, and present the results in Table 1a–1b. The right most column of Table 1b is the most direct empirical evidence
of our theoretical analysis. The performance of RESCAL embeddings is substantially lower on E rev in comparison to E ; E c.
The last row with d = 400 however shows a very sharp drop in the accuracy on E c while the performance of E rev increases
Fluffy</p>
      <p>Dog
(a) A toy knowledge base containing only is-a relations. The
dashed edges indicate unobserved relations that can be recovered
using the observed edges and the fact that is-a is a transitive
relation.</p>
      <p>(b) Assume that the black edges constitute E and the red dotted
denote Ec, then Erev contains the edges (v4; v1); (v4; v2); (v2; v1),
and (v3; v1).
slightly. We believe that this happens because of higher overfitting to the forward edges as the number of parameters
increases.</p>
      <p>FullSet
4095</p>
      <p>8191
d</p>
      <p>V = 2047
WordNet is a KB that contains vertices called synsets that are arranged in a tree like hierarchy under the hyponymy relation.
The hyponym of a synset is another synset that contains elements that have a more specific meaning. For example, the dog
synset4 is a hyponym of the animal synset and an animal is a hyponym of living thing therefore a dog is a hyponym of
living thing. We extracted all the hyponyms of the living thing.n.01 synset as the vertices of T and we used the transitive
closure of the direct hyponym relationship between two synsets as the edges of T . Quantitatively, the living thing synset
contained 16; 255 hyponyms, and 16; 489 edges. After performing the transitive closure E became 128; 241.</p>
      <p>We performed two experiments with the WordNet graphs, using the same FullSet and SubSet protocols described earlier.
The results are in the left half of Table 2. We see that even though the accuracy on E and E c is high, the performance on E rev
is much lower. This trend is in line with our theoretical prediction that the RESCAL model will fail on “reverse relations” as
the KB’s size increases.</p>
      <p>d</p>
      <sec id="sec-4-1">
        <title>FullSet</title>
      </sec>
      <sec id="sec-4-2">
        <title>SubSet</title>
      </sec>
      <sec id="sec-4-3">
        <title>SubSet</title>
        <p>and with a high dimensionality of embeddings it is possible to encode both the forward and the reverse relations in the
embeddings. Please note that we do not claim that our proposed augmentation for RESCAL will empirically be any better than the
much more recently proposed methods such as ARE [Nickel et al., 2014], or Poincare´ embeddings [Nickel and Kiela, 2017].
We leave a careful empirical comparison of these techniques for future work.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The information present in large scale knowledge bases has helped in moving information retrieval beyond retrieval of
documents to more specific entities and objects. And in order to further improve coverage of knowledge bases it is important
to research knowledge base completion methods. Since many knowledge bases contain information about real world artifacts
that obey hierarchical relations and logical properties, it is important to keep such properties in mind while designing
knowledge base completion algorithms. In this paper we demonstrate a close connection between logical properties of
relations such as asymmetry, and transitivity, and the performance of KBC algorithms used to predict those relations.
Specifically, we theoretically analyzed a popular KBC algorithm named RESCAL, and our analysis showed that the
performance of that model in encoding transitive and asymmetric relations must degrade as the size of the KB increases.
Our experimental results in Table 1a,1b and 2 confirmed our theoretical hypothesis, and most strikingly we observed that the
accuracy of RESCAL on E rev was substantially lower than its performance on either E or E c, even though E rev is a subset
of E c.</p>
      <p>In Table 3, we visualize the errors made by RESCAL by listing a few edges in E rev that were wrongly predicted as true
edges. These examples show that the trained RESCAL model can predict that fruit tree is a hyponym of mango or that every
accountant is a bean counter. Such wrong predictions can be harmful. Based on our analysis, we advocated for role specific
embeddings as a way of alleviating this shortcoming of RESCAL and we empirically showed its efficacy in Table 2.</p>
      <p>Our results also highlight a problem with the commonly employed KBC evaluation protocol of randomly dividing the
edge set of a graph into train and test sets for measuring knowledge base completion accuracy. For example with d = 50 the
average accuracy on both E and E c is quite high but on E rev accuracy is low even though E rev is a subset of E c. Such a
failure will stay undetected with existing evaluation methods.
We now present our novel proof of Theorem 1 beginning with a lemma. 5</p>
      <sec id="sec-5-1">
        <title>Lemma 2. Every transitive matrix is PSD.</title>
        <p>Proof. Consider the triplet of vectors c := x; b := M c; a := M b. Then aT (M b) = jjM bjj2 0 and bT (M c) = jjbjj2 0
and aT M c = bT M b. Three cases are possible, either b = 0, or M b = 0, or both M b 6= 0 and b 6= 0. In the third case
transitivity applies and we conclude that bT M b &gt; 0. In all cases bT M b 0 which implies M is PSD.</p>
        <p>The next lemma proves that if M is transitive then xT M y and xT M T y must have the same sign.</p>
        <p>Lemma 3. If 9x; y xT M y &gt; 0 but xT M T y &lt; 0 then M is not transitive.</p>
        <p>Proof. Let x; y be two vectors that satisfy xT M y &gt; 0 and xT M T y &lt; 0. Since xT M T y = yT M x therefore yT M ( x) &gt;
0. If we assume M is transitive, then xT M ( x) &gt; 0 by transitivity, but Lemma 2 shows such an x cannot exist.</p>
        <p>Lemma 4 is a general statement about all matrices which states that if the two bilinear forms have the same sign for all
inputs then they have to be scalar multiples of each other. We omit its proof due to space constraint.
Lemma 4. Let M1; M2 2 Rd d n f0g. If xT M1y&gt;0 =) xT M2y&gt;0 then M1 = M2 for some &gt; 0.</p>
        <p>Finally we use Lemma 3–4 to prove Theorem 1.</p>
        <p>Proof. Let M be a transitive matrix and let x; y be two vectors such that xT M y &gt; 0. By transitivity of M and Lemma 3
xT M T y &gt; 0. Therefore by Lemma 4 we get M = M T for some &gt; 0. Clearly = 1, this concludes the proof that M
is symmetric.</p>
        <p>5Theorem 1 was first proven by [Grinberg, 2015](unpublished). Our proof is more elementary and direct.</p>
        <p>MathOverflow.
[Guha, 2015] Guha, R. (2015). Towards a model theory for distributed representations. In AAAI Spring Symposium Series.
[Miller, 1995] Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bollacker et al.,
          <year>2008</year>
          ] Bollacker,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            , and
            <surname>Taylor</surname>
          </string-name>
          , J. (
          <year>2008</year>
          ).
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In Proceedings of the SIGMOD</source>
          , pages
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Bouchard et al.,
          <year>2015</year>
          ] Bouchard,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Trouillon</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>On approximate reasoning capabilities of low-rank vector spaces</article-title>
          .
          <source>In 2015 AAAI Spring Symposium Series.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Dalton et al.,
          <year>2014</year>
          ] Dalton,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Dietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , and
            <surname>Allan</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Entity query feature expansion using knowledge base links</article-title>
          .
          <source>In Proceedings of the 37th SIGIR</source>
          , pages
          <fpage>365</fpage>
          -
          <lpage>374</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Dong et al.,
          <year>2014</year>
          ] Dong,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Gabrilovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Heitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Strohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , and Zhang,
          <string-name>
            <surname>W.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Knowledge vault: A web-scale approach to probabilistic knowledge fusion</article-title>
          .
          <source>In Proceedings of the 20th SIGKDD</source>
          , pages
          <fpage>601</fpage>
          -
          <lpage>610</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Grinberg</source>
          , 2015] Grinberg,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Existence and characterization of transitive matrices</article-title>
          ? URL:http://mathoverflow.net/q/212808 (version:
          <fpage>2015</fpage>
          -08-01).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Nickel et al.,
          <year>2014</year>
          ] Nickel,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            , and
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Reducing the rank of relational factorization models by including observable patterns</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Neural Information Processing Systems</source>
          , pages
          <fpage>1179</fpage>
          -
          <lpage>1187</lpage>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Nickel and Kiela</source>
          , 2017] Nickel,
          <string-name>
            <given-names>M.</given-names>
            and
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Poincare´ embeddings for learning hierarchical representations</article-title>
          .
          <source>arXiv preprint arXiv:1705</source>
          .
          <fpage>08039</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Nickel et al., 2016a]
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2016a</year>
          ).
          <article-title>A review of relational machine learning for knowledge graphs</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>104</volume>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Nickel et al., 2016b]
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosasco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Poggio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2016b</year>
          ).
          <article-title>Holographic embeddings of knowledge graphs</article-title>
          .
          <source>In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence</source>
          , pages
          <fpage>1955</fpage>
          -
          <lpage>1961</lpage>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Nickel et al.,
          <year>2011</year>
          ] Nickel,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            , and
            <surname>Kriegel</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.-P.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>A three-way model for collective learning on multi-relational data</article-title>
          .
          <source>In Proceedings of the 28th ICML</source>
          , pages
          <fpage>809</fpage>
          -
          <lpage>816</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Ritter et al.,
          <year>2011</year>
          ] Ritter,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Mausam</surname>
          </string-name>
          , and
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Named entity recognition in tweets: An experimental study</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11</source>
          , pages
          <fpage>1524</fpage>
          -
          <lpage>1534</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>[Singh</surname>
          </string-name>
          et al.,
          <year>2015</year>
          ] Singh,
          <string-name>
            <given-names>S.</given-names>
            , Rockta¨schel, T., and
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Towards combined matrix and tensor factorization for universal schema relation extraction</article-title>
          .
          <source>In Workshop on VSM for NLP</source>
          , pages
          <fpage>135</fpage>
          -
          <lpage>142</lpage>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Toutanova et al.,
          <year>2015</year>
          ] Toutanova,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Pantel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Poon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            , and
            <surname>Gamon</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Representing text for joint embedding of text and knowledge bases</article-title>
          .
          <source>In Proceedings of the EMNLP</source>
          , pages
          <fpage>1499</fpage>
          -
          <lpage>1509</lpage>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Tucker</source>
          , 1966] Tucker,
          <string-name>
            <surname>L. R.</surname>
          </string-name>
          (
          <year>1966</year>
          ).
          <article-title>Some mathematical notes on three-mode factor analysis</article-title>
          .
          <source>Psychometrika</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>279</fpage>
          -
          <lpage>311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Yao and Van Durme</source>
          ,
          <year>2014</year>
          ] Yao,
          <string-name>
            <given-names>X.</given-names>
            and
            <surname>Van Durme</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Information extraction over structured data: Question answering with freebase</article-title>
          .
          <source>In Proceedings of the 52nd ACL</source>
          , pages
          <fpage>956</fpage>
          -
          <lpage>966</lpage>
          , Baltimore, Maryland. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Yoon et al.,
          <year>2016</year>
          ] Yoon, H.-G.,
          <string-name>
            <surname>Song</surname>
          </string-name>
          , H.-J.,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , S.-B., and
          <string-name>
            <surname>Park</surname>
          </string-name>
          , S.-Y. (
          <year>2016</year>
          ).
          <article-title>A translation-based knowledge graph embedding preserving logical property of relations</article-title>
          .
          <source>In Proceedings of NAACL-HLT</source>
          , pages
          <fpage>907</fpage>
          -
          <lpage>916</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>