<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Joint Tensor Decomposition on RDF Graphs</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>AKSW Group</institution>
          ,
          <addr-line>Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The decomposition of tensors has on multiple occasions shown state of the art performance on classical Semantic Web tasks such as link prediction. However, the structure of the LOD cloud has not been taken into account in these developments so far. In particular, the decentralized architecture underlying Linked Data suggests that the incorporation of distributed information into these decompositions might lead to better factorization results. The goal of this thesis is hence to develop and evaluate models for joint tensor factorization on RDF graphs that arise from parameter estimation in statistic models or algebraic approaches. In this paper, we focus on presenting the problem we address formally and on discussing preliminary experiments on semi-synthetic data. Our experiments yield promising results inspite of being derived from an ad-hoc approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The Semantic Web landscape is populated by large interlinked RDF graphs containing
partly redundant and often complementary knowledge. Since many knowledge graphs
contain subgraphs of information on the same entities, we argue that it would be
desireable to obtain a more complete picture for the relations between these resources
while performing typical semantic tasks like knowledge extraction, entity matching or
link prediction. Embedding the entities of those graphs into latent spaces, that consider
relations over multiple knowledge bases, opens up new avenues for improvement on
established methods for this family of semantic tasks, while simultaniously lending
itself to new tasks like the generation of synthetic knowledge graphs for benchmarking
or the detection of graph patterns. On the other hand, latent feature models have shown
state-of-the-art performance on the aforementioned tasks, especially when processing
very large graphs [13][7]. However, the state of the art focuses on one graph at a time
and commonly does not take the topology of the Linked Data Web into consideration.</p>
      <p>The aim of this thesis is to develop factorization approaches that can efficiently
process groups of graphs by performing the decompositions of their adjacency tensors
jointly and apply these decompositions in knowledge extraction and enrichment tasks.
The intuition underlying this work is that by extracting latent features with respect to
several graphs concurrently, we allow implicitly for information to permeate the
boundaries of single graphs and thus achieve more accurate factorizations. To achieve this goal
and jointly factorize even the largest knowledge bases, we aim to develop scalable and
time-efficient approaches that are able to incorporate meta-information like links
between entities in different knowledge bases and literals, while also hailing from sensible
and interpretable models. Thus, this thesis will focus on the research of joint
decompositions of knowledge base adjacency tensors and their applications on semantic tasks
like the detection of graph patterns to achieve better results on real world applications
like query answering systems.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Relevancy</title>
      <p>
        The results of this research could lead to latent feature models that incorporate
information that is spread out globally over multible RDF graphs. A large number of tasks
(especially tasks which require abstract representations of resources) can profit from these
results, including (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) question answering, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) structured machine learning, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) named
entity linking and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) relation extraction. For example, question answering frameworks
could use the latent features to map slots in a natural-language query to entries in a
SPARQL query. In relation extraction, the latent features of resources could potentially
improve the selection of sentences for extracting RDF triples from text. Named entity
linking could profit from the latent features we generate by providing numeric hints
towards entity mentions in natural language.
      </p>
      <p>The runtime and accuracy of the provided solutions will be the deciding factors
pertaining to their adoption by the Semantic Web community and the corresponding
industry. Hence, we aim to develop time-efficient methods and apply them to the tasks
aforementioned. Furthermore, to reason about these algorithms it would be desireable to
derive them explicitly from model hypotheses on the structure of real-world knowedge
bases. Thus, we will develop factorization models that abide by the characteristics of
Linked Data. We hope that our results will play a key role towards a new generation of
tools for processing large amounts of interlinked data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>There is a wealth of knowledge on tensor factorization in the literature. For the seminal
survey on this topic, see Kolda et al. [6]. On the topic of single tensor factorization, most
models are based on either the DEDICOM [2] or the CP [3] family of decompositions.
Nickel et al. proposed a relaxed DEDICOM model, refered to as RESCAL [12], that is
shown to be highly scalable [13], while still exhibiting state of the art performance on
smaller datasets. Building on this London et al. [10] suggested a similar model, adding
a bias term and using the learned model as input for a neural network. Kolda et al. [5]
proposed a CP Model, to identify citation communitys and measure document
similarity. Kuleshov et al. [8] proposed to find the CP factors of their model, via simultaneous
matrix diagonalization.</p>
      <p>On the topic of joint tensor factorization, Khan. et al. [4] proposed a bayesian model for
the problem of multi tensor factorization. They relaxed the constraints of the problem
to use normal distributions for inference of the CP factors. Also the joint factorization
of matrices and tensors has shown improvements over singular approaches, see [1].</p>
    </sec>
    <sec id="sec-4">
      <title>Research Question</title>
      <p>To adress the issues stated above, we plan to analyze and generalize existing approaches
and to use the insights gained there to develop new techniques and benchmark their
performance on knowledge extraction and integration tasks against the current state of
the art. As such, the research questions and corresponding subquestions at the heart of
our work are as follows:
– ”What existing approaches can be applied to joint factorization?”</p>
      <p>What existing approaches can be interpreted as stemming from statistical
models? By increasing the sample size one could generalize them to multiple
knowledge bases.</p>
      <p>Can we derive update rules that are computable on large datasets by combining
different cost functions or in a distributed manner?
– ”How can we use the resulting embeddings to improve upon existing tasks and what
new avenues can be explored?”</p>
      <p>Can one improve the performance of the translation of natural-language queries
into formal query languages (SPARQL) by using embeddings that respect the
spread of RDF data on the web?
Can we use these latent embeddings for the generation of synthetic graphs
which still capture the structure of RDF Data on the web? This will be useful
for benchmarking various querying systems and the like.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Hypotheses</title>
      <p>The basic hypothesis behind our work is that using the architecture underlying the
Linked Data Web can lead to the computation of better embeddings latent embeddings.
We hence aim to develop novel methods for tensor factorizatiion that make use of
distributed information to compute embeddings. The approaches we develop will aim to
be time-efficient and easy to distribute so as to scale to billions of triples. Moreover, we
will consider datasets that are updated regularly and aim for deriving updates rules or
modularization techniques that will ensure a efficient update of latent features.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Preliminary results</title>
      <p>To begin testing the potential of a simple variant of joint decomposition, we chose to
use the proven RESCAL-ALS [12] method on a data dump of the Side Effect Resource
SIDER.1 From that data, we built an adjacency tensor that does not differentiate
between literals and entities. Ee arrived at the full tensor with a shape of 11 30378
30378 and models the complete knowledge contained in the RDF graph. To model
delocalized information, we produced two more tensors X; Y of the same shape by deleting
a percentage p of entities from the full tensor twice, taking care that entities deleted in
1
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset/fu-berlin</p>
      <p>sider/resource/ee6c1709-f2a6-4790-8549-152232dd9d93</p>
      <sec id="sec-6-1">
        <title>X would be present in Y and vice versa2.</title>
        <p>The task at hand was to reconstruct the global knowledge contained in the full tensor
based exclusively on the partial knowledge contained in X and Y . To that end, we chose
to search for solutions of the ”RESCAL”-form, that solve the problem
(R~; A~) = arg (mR;iAn) kARAt</p>
        <p>Xk22 + kARAt</p>
        <p>Y k22 + reg(A; R)
with A 2 Rr 30378, R 2 R11 r r for some r &lt; 30378 and
reg(A; R) =
kAk22 + kRk22</p>
      </sec>
      <sec id="sec-6-2">
        <title>Take note that for all solutions of (1), one also has3</title>
        <p>(R~; A~) = arg (mR;iAn) kARAt</p>
        <p>
          X + Y k22 + reg(A; R)
2
This allows us to approximate solutions of (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), by using the RESCALS-ALS algorithm,
developed in [12]. To measure the goodness of fit, we first chose a rounding treshhold t
and rounded the entries of ARAt in comparison with that. After that we calculated the
F1-measure (F1), precision (P) and recall (R) of ARAt with respect to the full tensor
and compared results for different choices of t; r and p. We wrote the algorithm in
Python and all experiments were performed on an Intel Core i5-6500 CPU @ 3.20GHz
4 with 16GB of RAM. As a baseline we chose the best performing RESCAL fit of the
incomplete tensors X and Y. The following table holds the best results of our baseline
tests4
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
To enable a fine grained comparison we include all values of t in the table that
holds the values of the joint approach As one can see, the joint approach outperforms
the baseline by as much as 25% on the parameter pair (p = 30%; r = 15). The large
2 This ensures that no global information is being deleted
3 This can be seen by summing over the identity
        </p>
        <sec id="sec-6-2-1">
          <title>2 (ARAt)ijk</title>
          <p>Xijk + Yijk 2
2
= (ARAt)ijk
Xijk
2 + (ARAt)ijk
Yijk
2
+
1
2 (Xijk</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>Yijk)2</title>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>4 We rounded all values to two decimal places.</title>
        <p>
          changes of F1-measure in the joint approach at t = 0:5 can be explained by the fact that
the algorithm tries to approximate the tensor (X + Y )=2. Take note that
X + Y
2
ijk
80
&gt;
&lt;
= 1
&gt;: 12
if Xijk = Yijk = 0
if Xijk = Yijk = 1
if Xijk 6= Yijk
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
Thus by rounding up at 0:5 one will begin to see the values that were correctly learned
to be 1=2, among others.
        </p>
        <p>To remedy that situation we used the ad-hoc approach of learning the tensor
max((X; Y ))ijk = max(Xijk ; Yijk )5 under the same quadratic loss function, using the
same RESCAL-ALS algorithm. A sample of the results obtained are contained in the
following table
5 take note that, by our synthetic construction of X and Y, max(X,Y) will represent the full
knowledge graph, thus the parameter p is irrelevant here.</p>
        <p>As one can see, the results improved compared with learning the arithmetic mean.
This hints that there is improvement to be had by reweighting the entries in the joint
decomposition, which we will explore in our future work.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Approach</title>
      <p>
        (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
To exemplify the general approach, consider the statistical model (S; p), with the
measurable space
      </p>
      <p>S := ( ; P ( )) ; with
:= Yf0; 1gk n n
m
equipped with a propability measure</p>
      <p>p : P ( ) ! [0; 1]; fX1; : : : ; Xmg 7! p(fX1; : : : ; Xmg):
This is the general setting and by choosing different parametric families of propability
measures, one can use inference methods to obtain estimates for the parameters.
Take note that the choice of S makes this approach novel, because the usual methods for
statistical inference on semantic adjacency tensors use measurable spaces of the form
(f0; 1gk n n; P (f0; 1gn; p) to model the propabilities for links on a single tensor.
This is well known and appears frequently in the Semantic Web literature. For instance
in [11], the authors use the propability function
p(XjA; R) :=</p>
      <p>Y p~(xijkj hAi; RkAj i)
i;j;k
with p~(xijk = 1j hAi; RkAj i) := (hAi; RkAj i). Then they chose to estimate the
parameters A and R by minimizing the log-likelihood. By extending the sample space to
multiple tensors, one has a larger sample size for each individual entry and that will
lead to better parameter estimations.</p>
      <p>Furthermore, if one has to deal with multiple knowledge bases, the question of
confidence arises. To solve the problem of observing a link in knowledge base A and not
in knowledge base B, one would have to introduce confidence values to the model, to
make informed predictions.</p>
      <p>Another approach will be modelling the random events in our model as Markov
processes. This would be appropriate to model situations where the source of our data are
multiple versions of a knowledge graph, at different points in time. The ability to
incorporate Markov processes of graphs into our model, is theoretical highly valuable, since
they possess an unparalleled modelling power for real world processes.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Evaluation Plan</title>
      <p>To test our hypotheses, we will apply our derived algorithms in three stages. Stage one
will be a test on synthetic or semi-synthetic data, where we will build tensors from
real-world data, according to synthetic rules to model the existence of a tensor that
contains the global knowledge and measure the performance of the model using the</p>
      <sec id="sec-8-1">
        <title>F1-measure6.</title>
        <p>Stage two will be a test on real world tensors which contain a large amount of related
entities, e.g., YAGO [14] and DBPedia [9]. Thus, we are abstracting the existence of a
global knowledge tensor, still keeping the same evaluation sheme as in stage one.
Finally in stage three, we will test the latent factors we have benchmarked in stage two
to fields such as relation extraction, entity linking and question answering to boost their
performances.
9</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Reflections</title>
      <p>As our preliminary results show, one can expect better latent embeddings by
considering multiple sources of relational data. While there is a wealth of knowledge for the
factorization of matrices or tensors, there is still relatively little work done on the joint
factorization of tensors. Furthermore, by considering quite general propability
functions, we can model sophisticated dependencies of different knowledge graphs.
This will be the start of this PHD thesis.</p>
      <p>Acknowledgements This work has been supported by Eurostars projects DIESEL
(E!9367) and QAMEL (E!9725) as well as the European Union’s H2020 research and
innovation action HOBBIT (GA 688227).</p>
      <p>
        Also, I want to thank my advisors Dr. Axel-C. Ngonga Ngomo, Ricardo Usbeck and
the whole team at AKSW for all their helpful comments and fruitful coffee-driven
conversations.
6 just as in our ”Preliminary Results” section
9. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann,
M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - a large-scale, multilingual
knowledge base extracted from wikipedia. Semantic Web Journal, 6(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):167–195, 2015.
10. B. London, T. Rekatsinas, B. Huang, and L. Getoor. Multi-relational learning using weighted
tensor decomposition with modular loss. CoRR, abs/1303.1733, 2013.
11. M. Nickel and V. Tresp. Logistic Tensor Factorization for Multi-Relational Data. ArXiv
e-prints, June 2013.
12. M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning on
multirelational data. In L. Getoor and T. Scheffer, editors, Proceedings of the 28th International
Conference on Machine Learning (ICML-11), ICML ’11, pages 809–816, New York, NY,
USA, June 2011. ACM.
13. M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing yago: Scalable machine learning for
linked data. In Proceedings of the 21st International Conference on World Wide Web, WWW
’12, pages 271–280, New York, NY, USA, 2012. ACM.
14. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In
Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pages
697–706, New York, NY, USA, 2007. ACM.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Acar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Savorani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bro</surname>
          </string-name>
          .
          <article-title>Understanding data fusion within the framework of coupled matrix and tensor factorizations</article-title>
          .
          <source>Chemometrics and Intelligent Laboratory Systems</source>
          ,
          <volume>129</volume>
          :
          <fpage>53</fpage>
          -
          <lpage>63</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Harshman</surname>
          </string-name>
          .
          <article-title>Models for analysis of asymmetrical relationships among n objects or stimuli</article-title>
          .
          <source>In First Joint Meeting of the Psychometric Society and the Society for Mathematical Psychology</source>
          , McMaster University, Hamilton, Ontario,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Hitchcock</surname>
          </string-name>
          .
          <article-title>The expression of a tensor or a polyadic as a sum of products</article-title>
          .
          <source>J. Math. Phys</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <fpage>164</fpage>
          -
          <lpage>189</lpage>
          ,
          <year>1927</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Leppa¨aho, and S. Kaski. Multi-tensor factorization</article-title>
          .
          <source>arXiv preprint arXiv:1412.4679</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T.</given-names>
            <surname>Kolda</surname>
          </string-name>
          .
          <article-title>Multilinear algebra for analyzing data with multiple linkages</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Kolda</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Bader</surname>
          </string-name>
          .
          <article-title>Tensor decompositions and applications</article-title>
          .
          <source>SIAM Rev</source>
          .,
          <volume>51</volume>
          (
          <issue>3</issue>
          ):
          <fpage>455</fpage>
          -
          <lpage>500</lpage>
          , Aug.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.</given-names>
            <surname>Krompaß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Large-scale factorization of type-constrained multirelational data</article-title>
          .
          <source>In Data Science and Advanced Analytics (DSAA)</source>
          , 2014 International Conference on, pages
          <fpage>18</fpage>
          -
          <lpage>24</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>V.</given-names>
            <surname>Kuleshov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Chaganty</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <article-title>Tensor factorization via matrix factorization</article-title>
          .
          <source>CoRR, abs/1501.07320</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>