<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Latent-Factor Database Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Denis Krompa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xueyian Jiang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Nickel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volker Tresp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ludwig Maximilian University of Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Massachusetts Institute of Technology</institution>
          ,
          <addr-line>Cambridge, MA and Istituto Italiano di Tecnologia, Genova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Siemens AG, Corporate Technology</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe a general framework for modelling probabilistic databases using factorization approaches. The framework includes tensor-based approaches which have been very successful in modelling triple-oriented databases and also includes recently developed neural network models. We consider the case that the target variable models the existence of a tuple, a continuous quantity associated with a tuple, multiclass variables or count variables. We discuss appropriate cost functions with di erent parameterizations and optimization approaches. We argue that, in general, some combination of models leads to best predictive results. We present experimental results on the modelling of existential variables and count variables.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Tensor models have been shown to e ciently model triple-oriented databases [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
where the main goal is to predict the probability for the existence of a triple.
Here we generalize the approach in several directions. First, we show that
relations with any arity can be modeled, not just triple stores. Second, we show that
any set of target variables that is associated with a triple can be modelled. As
examples one might predict the rating of a user for an item, the amount of a
speci c medication for a patient, or the number of times that team A played against
team B. In each of these cases a di erent likelihood model might be appropriate
and we discuss di erent likelihood functions, their di erent parameterizations
and learning algorithms. Third, we discuss a more general framework that
includes recently developed neural network models [
        <xref ref-type="bibr" rid="ref1 ref13">13, 1</xref>
        ]. Finally, we argue that
model combinations sometimes o er greater exibility and predictive power. We
present experimental results on the modelling of existential variables and count
variables using di erent likelihood models.
      </p>
      <p>The paper is organized as follows. In the next section we describe the
probabilistic setting and in Section 3 we introduce the factorization framework and
some speci c models. In Section 4 we describe the learning rules and Section 5
contains our experimental results. Section 6 describes extensions. Section 7
contains our conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>Probabilistic Database Models</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Database Notation</title>
        <p>k M
Consider a database as a set of M relations fr gk=1. A relation is a table with
attributes as columns and tuples Ei = fel(i;1); el(i;2); : : : ; el(i;Lk)g as rows where
Lk is the number of attributes or the arity of the relation rk. l(i; i0) is the index
of the domain entity in tuple Ei in column i0. A relation rk is closely related
to the predicate rk(Ei), which is a function that maps a tuple to true (or 1) if
the Ei belongs to the relation and to false (or 0) otherwise. We model a triple
(s; p; o) as a binary relation p where the rst column is the subject s and the
second column is the object o.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Probabilistic Database Model</title>
        <p>We now associate with each instantiated relation rk(Ei) a target quantity xik.
Formally we increase the arity of the relation by the dimension of xik, so a binary
relation would become a ternary relation, if xik is a scalar. Here, the target xik
can model di erent quantities. It can stand for the fact that the tuple exists
(xik = 1) or does not exist (xik = 0) i.e., we model the predicate. In another
application Ei might represent a user/item pair and xik is the rating of the user
for the item. Alternatively, xik might be a count, for example the number of times
that the relation rk(Ei) has been observed. In the following we form predictive
models for xk; thus we can predict, e.g., the likelihood that a tuple is true, or
i
the rating of a user for an item, or the number of times that relation rk(Ei) has
been observed.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Likelihood Functions and Cost Functions</title>
        <p>Convenient likelihood functions originate from the (overdispersed) exponential
family of distributions.</p>
        <p>Bernoulli. The Bernoulli model is appropriate if the goal is to predict the
k
existence of a tuple, i.e., if we model the predicate. With xi 2 f0; 1g, we model
P (xik = 1j ik) = i</p>
        <p>k
1. From this equation we can derive the penalized log-likelihood
with 0 ik
cost function
lossBeik =
Here, Kik = 0; ik &gt; 0 and ik &gt; 0 are derived from the conjugate
betadistribution and can represent virtual data, in the sense that they represent
ik 1 additional observations of xik = 1 and ik 1 additional observations of
xik = 0. The contribution of the prior drops out with ik = 1; ik = 1.</p>
        <p>Note that we have the constraints that 0 ik 1. A convenient
reparameterization can be achieved using the framework of the exponential family
of distributions which suggests the parametrization ik = sig( ik), where the
natural parameter ik is unconstraint and where sig(arg) = 1=(1 + exp( arg)) is the
logistic function.</p>
        <p>Gaussian. The Gaussian model can be used to predict continuous quantities,
e.g., the amount of a given medication for a given patient. The Gaussian model
is</p>
        <p>P (xikj ik) / exp</p>
        <p>1
2 2 (xik
k 2
i )
where we assume that either 2 is known or is estimated as a global parameter
in a separate process. With a Gaussian likelihood function we get
1
lossGik = 2( ik)2 (xik
ik)2 +</p>
        <p>1
2( ik)2 (cik
ik)2:
Note that the rst term is simply the squared error. The second term is
derived from the conjugate Gaussian distribution and implements another cost
k
term, which can be used to model a prior bias toward a user-speci ed ci . The
contribution of the prior drops out with ik ! 1.</p>
        <p>Binomial. If the Bernoulli model represents the outcome of the tossing of
one coin, the binomial model corresponds to the event of tossing a coin K times.
We get
k
P (xikj ik) / ( ik)xi (1
ik)K xik :
The cost function is identical to the cost function in the Bernoulli model
(Equation 1), only that Kik = K 1 and xik 2 f0; 1; : : : ; Kg is the number of observed
\heads".</p>
        <p>
          Poisson. Typical relational count data which can be modelled by Poisson
distributions are the number of messages sent between users in a given time
frame. For the Poisson distribution, we get
k
P (xikj ik) / ( ik)xi exp(
k
i )
(2)
and
lossPik =
with xik 2 N0; ik &gt; 0; ik &gt; 0 are parameters in the conjugate gamma-distribution.
The contribution of the prior drops out with ik = 1; ik = 0. Here, the natural
parameter is de ned as ik = exp( ik). Note that the cost function of the Poisson
model is, up to parameter-independent terms, identical to the KL-divergence
cost function [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>Multinomial. The multinomial distribution is often used for textual data
where counts correspond to how often a term occurred in a given document. For
the multinomial model we get
lossMik =</p>
        <p>
          Ranking Criterion. Finally we consider the ranking criterion which is used
in the Bernoulli setting with xik 2 f0; 1g. It is not derived from an exponential
family model but has successfully been used in triple prediction, e.g., in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
Consider a binary relation where the rst attribute is the subject and the second
attribute is the object. For a known true tuple with xik = 1 we de ne lossRik =
PcC=1 max 0; 1 ik + ik;c where ik;c is randomly chosen from all triples with
the same subject and predicate but with a di erent object with target 0. Thus
one scores the correct triple higher than its corrupted one up to a margin of
1. The use of a ranking criterion in relational learning was pioneered by [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as
Bayesian Personalized Ranking (BPR) with a related ranking cost function of
the form lossBPRik = PcC=1 log sig( ik ik;c).
        </p>
        <p>Interpretation. After modelling, the probability P (xikj ik), resp. P (xikj ik),
can be interpreted as the plausibility of the observation given the model. For
example, in the Bernoulli model we can evaluate how plausible an observed
tuple is and we can predict which unobserved tuples would very likely be true
under the model.
3
3.1</p>
        <p>A Framework for Latent-Factor Models</p>
      </sec>
      <sec id="sec-2-4">
        <title>The General Setting</title>
        <p>We consider two models where all relations have the same arity Lk. In the
multitask setting, we assume the model</p>
        <p>Eki=fel(i;1);el(i;2);:::;el(i;Lk)g = fwk al(i;1); al(i;2); : : : ; al(i;Lk) :
(3)
Here al is a vector of 2 N latent factors associated with el to be optimized
during the training phase.4 l(i; i0) maps attribute i0 of tuple Ei to the index of
the entity. This is a multi-task setting in the sense that for each relation rk a
separate function with parameters wk is modelled.</p>
        <p>In the single-task setting, we assume the model</p>
        <p>Eki=fel(i;1);el(i;2);:::;el(i;Lk)g = fw al(i;1); al(i;2); : : : ; al(i;Lk); a~k :
Note that here we consider a single function with parameter vector w where a
relation is represented by its latent factor ~ak.</p>
        <p>In case that we work with natural parameters, we would replace Eki with Eki
in the last two equations.
3.2</p>
      </sec>
      <sec id="sec-2-5">
        <title>Predictive Models</title>
        <p>We now discuss models for fwk ( ) and fw( ). Note that not only the model
weights are uncertain but also the latent factors of the entities. We rst describe
4 Here we assume that the rank
relaxed in some models.</p>
        <p>is the same for all entities; this assumption can be
tensor approaches for the multi-task setting and the single-task setting and then
describe two neural network models.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Tensor Models for the Multi-task Setting. Here, the model is</title>
        <p>
          fwk al(i;1); al(i;2); : : : ; al(i;Lk)
(4)
= X
This equation describes a RESCAL model which is a special case of a Tucker
tensor model with the constraint that an entity has a unique latent representation,
independent of where it appears in a relation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This property is important to
achieve relational collective learning [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>In the original RESCAL model, one considers binary relations with Lk = 2
(RESCAL2). Here A with (A)l;s = al;s is the matrix of all latent representations
of all entities. Then Equation 4 can be written in tensor form as
F = R
fw al(i;1); al(i;2); : : : ; al(i;Lk); a~k
(5)
= X</p>
        <p>X : : : X</p>
        <p>X ws1;s2;:::sLk ;t al(i;1);s1 al(i;2);s2 : : : al(i;Lk);sLk a~k;t :
sLk =1 t=1
Note that the main di erence is that now the relation is represented by its own
latent factor a~k. Again, this equation describes a RESCAL model. For binary
relations one speaks of a RESCAL3 model and Equation 5 becomes
F = R
where (A~)k;t = a~k;t and the core tensor is (R)s1;s2;t = ws1;s2;t.</p>
        <p>
          If a~k;t is a unit vector with the 1 at k = t, then we recover the
multitask setting. If all weights are 0, except for \diagonal" weights with s1 = s2 =
: : : = sLk = t, this is a PARAFAC model and only a single sum remains. The
PARAFAC model is used in the factorization machines [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In factorization
machines, attributes with ordinal or real values are modelled by az(i) = z(i)a
where z(i) is the value of the attribute in Ei and a is a latent factor vector for
the attribute independent of the particular value z(i).
        </p>
        <p>Please note that the Lk-order polynomials also contain all lower-order
polynomials, if we set, e.g., al;1 = 1, 8l. In the factorization machine, the order of
the polynomials is typically limited to 1 or 2, i.e. all higher-order polynomials
obtain a weight of 0.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Neural Tensor Networks. Here, the model is</title>
        <p>
          The output is a weighted combination of the logistic function applied to H
di erent tensor models. This is the model used in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], where the tanh( ) was
used instead of the logistic function.
        </p>
        <p>Google Vault Model. Here a neural network is used of the form
fw al(i;1); al(i;2); : : : ; al(i;Lk); a~k
0
v1;s1 al(i;1);s1 + : : : +</p>
        <p>
          vLk;sLk al(i;Lk);sLk +
X
The latent factors are simply the inputs to a neural network with one hidden
layer. This model was used in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in context of the Google Knowledge Graph. It
is related to tensor models for the single-task setting where the xed polynomial
basis functions are replaced by adaptable neural basis functions with logistic
transfer functions.
Complete Data. This means that for all relevant tuples the target variables
are available.
        </p>
        <p>Assumed Complete Data. This is mostly relevant when xik is an existential
variable, where one might assume that tuples that are not listed in the relation
are false. Mathematically, we then obtain a complete data model and this is the
setting in our experiments. Another interpretation would be that with sparse
data xik = 0 is a correct imputation for those tuples.</p>
        <p>Missing at Random. This is relevant, e.g, when xik represents a rating.
Missing ratings might be missing at random and the corresponding tuples should
be ignored in the cost function. Computationally, this can most e ciently be
exploited by gradient-based optimization methods (see Section 4.3). Alternatively
one can use ik and ik to implement prior knowledge about missing data.</p>
        <p>Ranking Criterion. On the ranking criterion one does not really care if
unobserved tuples are unknown or untrue, one only insists that the observed
tuples should obtain a higher score by a margin than unobserved tuples.
In all approaches the parameters and latent factors are regularized with penalty
term AkAkF and W kW kF where k kF indicates the Frobenius norm and where
A 0 and W 0 are regularization parameters.</p>
      </sec>
      <sec id="sec-2-8">
        <title>Optimizing the Cost Functions</title>
        <p>
          Alternating Least Squares. The minimization of the Gaussian cost function
lossG with complete data can be implemented via very e cient alternating least
squares (ALS) iterative updates, e ectively exploiting data sparsity in the
(assumed) complete data setting [
          <xref ref-type="bibr" rid="ref10 ref7">10, 7</xref>
          ]. For example, RESCAL has been scaled up
to work with several million entities and close to 100 relation types. The number
of possible tuples that can be predicted is the square of the number of entities
times the number of predicates: for example RESCAL has been applied to the
Yago ontology with 1014 potential tuples [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-9">
        <title>Natural Parameters: Gradient-Based Optimization. When natural</title>
        <p>
          parameters are used, unconstrained gradient-based optimization routines like
L-BFGS can be employed, see for example [
          <xref ref-type="bibr" rid="ref6 ref9">6, 9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-10">
        <title>Non-Negative Tensor Factorization. If we use the basis representation</title>
        <p>
          with ik parameters, we need to enforce that ik 0. One option is to employ
nonnegative tensor factorization which leads to non-negative factors and weights. For
implementation details, consult [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Stochastic Gradient Descent (SGD). In principal, SGD could be applied
to any setting with any cost function. In our experiments, SGD did not converge
to any reasonable solutions in tolerable training time with cost functions from
the exponential family of distributions and (assumed) complete data. SGD and
batch SGD were successfully used with ranking cost functions in [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ] and we
also achieved reasonable results with BPR.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>Due to space limitations we only report experiments using the binary multi-task
model RESCAL2. We performed experiments on three commonly used
benchmark data sets for relational learning:
Kinship 104 entities and M = 26 relations that consist of several kinship
relations within the Alwayarra tribe.</p>
      <p>Nations 14 entities and M = 56 relations that consist of relations between
nations (treaties, immigration, etc). Additionally the data set contains
attribute information for each entity.</p>
      <p>UMLS 135 entities and M = 49 relations that consist of biomedical
relationships between categorized concepts of the Uni ed Medical Language System
(UMLS).
5.1</p>
      <sec id="sec-3-1">
        <title>Experiments with Di erent Cost Functions and Representations</title>
        <p>Here xik = 1 stands for the existence of a tuple, otherwise xik = 0. We evaluated
the di erent methods using the area under the precision-recall curve (AUPRC)
performing 10-fold cross-validation. Table 1 shows results for the three data
sets (\nn" stands for non-negative and \nat\ for the usage of natural
parameters). In all cases, the RESCAL model with lossG (\RESCAL") gives excellent</p>
        <p>Rand RESCAL nnPoiss nnMulti natBern natPoiss natMulti SGD stdev
Nations 0.212 0.843 0.710 0.704 0.850 0.847 0.659 0.825 0.05
Kinship 0.039 0.962 0.918 0.889 0.980 0.981 0.976 0.931 0.01
UMLS 0.008 0.986 0.968 0.916 0.986 0.967 0.922 0.971 0.01</p>
        <p>Rand RESCAL nnPoiss nnMulti natBin natPoiss natMulti RES-P stdev
Nations 0.181 0.627 0.616 0.609 0.637 0.632 0.515 0.638 0.01
Kinship 0.035 0.949 0.933 0.930 0.950 0.952 0.951 0.951 0.01
UMLS 0.007 0.795 0.790 0.759 0.806 0.806 0.773 0.806 0.01
performance and the Bernoulli likelihood with natural parameters (\natBern")
performs even slightly better. The Poisson model with natural parameters also
performs quite well. The performance of the multinomial models is signi cantly
worse. We also looked at the sparsity of the solutions. As can be expected only
the models employing non-negative factorization lead to sparse models. For the
Kinship data set, only approximately 2% of the coe cients are nonzero, whereas
models using natural parameters are dense. SGD with the BPR ranking criterion
and AdaGrad batch optimization was slightly worse than RESCAL.</p>
        <p>Another issue is the run-time performance. RESCAL with lossG is fastest
since the ALS updates can e ciently exploit data sparsity, taking 1.9 seconds
on Kinship on an average Laptop (Intel(R) Core(TM) i5-3320M with 2.60 GHz).
It is well-known that the non-negative multiplicative updates are slower, having
to consider the constraints, and take approximately 90 seconds on Kinship. Both
the non-negative Poisson model and the non-negative multinomial model can
exploit data sparsity. The exponential family approaches using natural parameters
are slowest, since they have to construct estimates for all ground atoms in the
(assumed) complete-data setting, taking approximately 300 seconds on Kinship.
SGD converges in 108 seconds on Kinship.
5.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experiments on Count Data</title>
        <p>Here xk</p>
        <p>i 2 N0 is the number of observed counts. Table 2 shows results where we
generated 10 database instances (worlds) from a trained Bernoulli model and
generated count data from 9 database instances and used the tenth instance for
testing. Although RESCAL still gives very good performance, best results are
obtained by models more appropriate for count data, i.e., the binomial model
and the Poisson model using natural parameters. The non-negative models are
slightly worse than the models using natural parameters.
5.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Probabilities with RESCAL</title>
        <p>
          Due to its excellent performance and computational e ciency, it would be very
desirable to use the RESCAL model with lossG and ALS, whenever possible.
As discussed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], by applying post-transformations RESCAL predictions can
be mapped to probabilities in Bernoulli experiments. For Poisson data we can
assume a natural parameter model with ik = 2 and model x~ik = log(1 + xik)
which leads to a sparse data representation that can e ciently be modelled with
RESCAL and lossG. The results are shown as RES-P in Table 2 which are among
the best results for the count data!
6
6.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Extensions</title>
      <sec id="sec-4-1">
        <title>SUNS Models</title>
        <p>Consider a triple store. In addition to the models described in Section 3 we can
also consider the following three model for f (subject; predicate; object)</p>
        <sec id="sec-4-1-1">
          <title>X X asubject;mapuo;</title>
          <p>m=1 u=1</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>X X aobject;masup;</title>
          <p>m=1 u=1</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>X X apredicate;masuo:</title>
          <p>
            m=1 u=1
These are three Tucker1 models and were used as SUNS models (SUNS-S,
SUNSO, SUNS-P) in [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. apo, asp, and aso are latent representations of (p; o), (s; p),
and (s; o), respectively.
6.2
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Model Combinations</title>
        <p>
          The di erent SUNS models and RESCAL models have di erent modelling
capabilities and often a combination of several models gives best results [
          <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
          ].
Table 3 shows the performance for the RESCAL model, two SUNS models and
the performance of an additive model of all three models. For Nations, SUNS-P
performs well and boosts the performance of the combined model. SUNS-P can
model correlations between relations, e.g., between likes and loves.
7
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        We have presented a general framework for modelling probabilistic databases
with factor models. When data are complete and sparse, the RESCAL model
with a Gaussian likelihood function and ALS-updates is most e cient and highly
scalable. We show that this model is also applicable for binary data and for
count data. Non-negative modelling approaches give very sparse factors but
performance decreases slightly. An issue is the model rank . In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] it has been
shown that the rank can be reduced by using a combination of a factor model
with a model for local interactions, modelling for example the triangle rule.
Similarly, the exploitation of type-constraints can drastically reduce the number of
plausible tuples and reduces computational load dramatically [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ].
      </p>
      <p>Acknowledgements. M. N. acknowledges support by the Center for Brains,
Minds and Machines, funded by NSF STC award CCF-1231216.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          , E. Gabrilovich, G. Heitz,
          <string-name>
            <given-names>W.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. Zhang.</surname>
          </string-name>
          <article-title>Knowledge vault: A web-scale approach to probabilistic knowledge fusion</article-title>
          .
          <source>In KDD</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          .
          <article-title>Link prediction in multi-relational graphs using additive models</article-title>
          . In SeRSy workshop, ISWC,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Krompa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Non-negative tensor factorization with rescal</article-title>
          .
          <source>In Tensor Methods for Machine Learning, ECML workshop</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Krompa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Factorizing large heterogeneous multirelational-data</article-title>
          .
          <source>In Int. Conf. on Data Science and Advanced Analytics</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Krompa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Querying factorized probabilistic triple databases</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>London</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Getoor</surname>
          </string-name>
          <article-title>. Multi-relational learning using weighted tensor decomposition with modular loss</article-title>
          .
          <source>In arXiv:1303.1733</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          .
          <article-title>Tensor factorization for relational learning</article-title>
          .
          <source>PhD-thesis</source>
          , LudwigMaximilian-University of Munich, Aug.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Learning from latent and observable patterns in multi-relational data</article-title>
          .
          <source>In NIPS</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Logistic tensor factorization for multi-relational data</article-title>
          .
          <source>In WSTRUC WS at the ICML</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Nickel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tresp</surname>
            , and
            <given-names>H.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Kriegel</surname>
          </string-name>
          .
          <article-title>A three-way model for collective learning on multi-relational data</article-title>
          .
          <source>In ICML</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Nickel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tresp</surname>
            , and
            <given-names>H.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Kriegel</surname>
          </string-name>
          .
          <article-title>Factorizing yago: scalable machine learning for linked data</article-title>
          .
          <source>In WWW</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nanopoulos</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          .
          <article-title>Learning optimal ranking with tensor factorization for tag recommendation</article-title>
          .
          <source>In KDD</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <article-title>Reasoning with neural tensor networks for knowledge base completion</article-title>
          .
          <source>In NIPS</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bundschus</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rettinger</surname>
          </string-name>
          .
          <article-title>Materializing and querying learned knowledge</article-title>
          . In IRMLeS, ESWC workshop,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>