<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Scalable Kernel Approach to Learning in Semantic Graphs with Applications to Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yi Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Nickel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volker Tresp</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hans-Peter Kriegel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ludwig-Maximilians-Universita ̈t Mu ̈ nchen</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Siemens AG, Corporate Technology</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we discuss a kernel approach to learning in semantic graphs. To scale up the performance to large data sets, we employ the Nystro¨ m approximation. We derive a kernel derived from semantic relations in a local neighborhood of a node. One can apply our approach to problems in multi-relational domains with several thousand graph nodes and more than a million potential links. We apply the approach to DBpedia data extracted from the RDF-graph of the Semantic Web's Linked Open Data (LOD).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        the presented approach, the scalability of the overall approach is guaranteed. First, we
can control the number of instances considered in the Nystro¨m approximation. Second
we can control the rank of the approximation. Third, we can control the number of local
features that are used to derive the kernel. A special case of our approach was already
presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A novel contribution here is that we discuss the approach as
a general kernel approach using the Nystro¨m approximation. Another novelty is that
we apply our approach to DBpedia [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which is based on information extracted from
Wikipedia. DBpedia is part of the Linked Open Data (LOD) cloud where the term
Linked Data is used to describe a method of exposing, sharing, and connecting data via
dereferenceable URIs on the Web [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The paper is organizes as follows. In the next section we discuss related work. In
Section 3 we review the kernel approximation based on the Nystro¨m approximation
and we apply it to least-squares prediction. Section 4 defines our kernel approach to
learning in semantic graphs. Section 5 reports results based on DBpedia data. Section 6
presents our conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recently, there has been quite some work on the relationship between kernels and
graphs. Graph kernels evaluate the similarity between graphs and can be classified into
three classes: graph kernels based on walks and paths, graph kernels based on
limitedsize subgraphs and graph kernels based on subtree patterns [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. It is not immediately
clear how those approaches can be used for link prediction. Link prediction on graphs
is quite related to semi-supervised learning as surveyed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] where the goal is to
predict node labels based on known node labels in a graph. Kernels for semi-supervised
learning have, for example, been derived from the spectrum of the Graph-Laplacian.
In [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] approaches for Gaussian process based link prediction have been presented.
Link prediction in relational graphs has also been covered from the relational learning
and the ILP communities [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13–15</xref>
        ]. Kernels for semantically rich domains have been
developed by [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Most of the discussed kernel approaches cannot easily be applied to the rich
semantic domains considered here. In fact, many have been developed in the context of a
single object type and a single relation type. The experimental results on the semantic
kernels described in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] are still quite limited.
3
3.1
      </p>
      <p>Scalable Kernel Solutions Using the Nystr o¨m Approximation
Defining the Population
A semantic graph typically consists of many different types of objects and many types
of relations. In our statistical approach we only make statements on a subset of those
nodes, which form the the statistical units or instances in the population. A statistical
unit is an object of a certain type, e.g., a person. The population is the set of
statistical units under consideration. In general, the population is application dependent. It is
advantageous if the population is homogeneous. E.g., the set of all students in Munich
might be a good choice whereas the set that includes all students in Munich and all
professors in Berkeley might be problematic.
3.2</p>
      <p>
        The Nystro¨m Approximation
We now assume that for any two instances i and j in the population a kernel ki,j is
defined. A subset of the population of size N , i.e., the sample, defines the training
set. Let K be the kernel matrix (i.e., Gram matrix) for the training instances. In many
applications N can be very large, therefore we now follow [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and use the Nystro¨m
approximation to scale up kernel computations to large data sets.
      </p>
      <p>The Nystro¨m approximation is based on an approximation to eigen functions and
starts with the eigen decomposition</p>
      <p>K = U DU &gt;
(1)
of the kernel matrix. The Nystro¨m approximation to the kernel for two arbitrary
instances i and j can be written as</p>
      <p>ki,j ≈ k.&gt;,i Ur diagr (1/dl) Ur&gt; k.,j
where diagr (1/dl) is a diagonal matrix containing the inverse of the r leading
eigenvalues in D and where Ur contains the corresponding r columns of U .4 Here, k.,i is a
vector of kernels between instance i and the training instances.</p>
      <p>
        There are two special cases of interest. First, the vector of approximate kernels
between a statistical unit i and all units in the training data can be written as
and the matrix of approximate kernels between all pairwise units in the training data is
k.,i ≈ UrUr&gt;k.,i
K ≈ Ur diagr (dl) Ur&gt;.
(2)
(3)
These modified kernels can now be used in kernel approaches such as SVM learning
or Gaussian process learning. In particular, the reduced rank approximation Equation 3
can greatly reduce the computational requirements [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].5
3.3
      </p>
      <p>Example: Regularized Least Squares Solutions for Multivariate Prediction
We now assume that for an instance i we have L targets or random variables yi =
(yi,1, . . . , yi,L)&gt; available. We want to train a model of the form yˆi = k&gt;(., i)W where
W is an N × L weight matrix.</p>
      <p>A regularized least squares cost function can be formulated as</p>
      <p>
        trace(Y − KW )(Y − KW )&gt; + λ traceW &gt;KW
4 Based on this approximation the rank of any kernel matrix is less than or equal to r ≤ N .
5 We use the Nystro¨m approximation slightly differently from [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. There, Equation 1 is used
on a submatrix of K and Equation 2 is then used to approximate K.
where Y = (y1, . . . , yN )&gt; and where λ ≥ 0 is a regularization parameter. If we use
the Nystro¨ m approximation for the kernels we obtain as least squares solution for the
weight matrix
      </p>
      <p>WLS = U diagr</p>
      <p>1
dl + λ</p>
      <p>U &gt;Y.</p>
      <p>The prediction for the training data (i.e., in smoothing or transduction) is
and in general</p>
      <p>Yˆ = U diagr</p>
      <p>dl
dl + λ</p>
      <p>U &gt;Y
yˆi = k&gt;(., i)WLS .</p>
      <p>
        We now consider some special kernels. Assume that for each instance i, in addition
to the random variables of interest yi, we also have covariates xi available. Covariates
might, for example, represent aggregated information. If the kernel can be written as an
inner product of the covariates kix,j = xi&gt;xj , our Nystro¨ m approximation is equivalent
to regularized PCA regression in that covariate space. Another interesting solution is
y
when ki,j = yi&gt;yj in which case our Nystro¨ m approximation is equivalent to
regularized matrix reconstruction via PCA, often used in collaborative filtering. Note that in
the latter case the low rank Nystro¨ m approximation is not only a necessity to obtain a
scalable solution but is also necessary to obtain valid predictions at all: with λ → 0
and r = N we would obtain the trivial Yˆ = Y . Finally, with kiz,j = zi&gt;zj where
zi = (αxi&gt;, yi&gt;)&gt;, we obtain the reduced rank penalized regression (RRPP) algorithm
in the SUNS framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Here, α is a positive weighting factor balancing the
influence of the two information sources.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Kernel for Semantic Graphs</title>
      <p>
        So far the discussion has been quite general and the Nystro¨ m approximation can be used
for any kernel defined between instances in the population. As discussed in Section 2,
there are a number of interesting kernels defined for nodes in a graph but most of them
are not directly applicable to the rich domain of a semantic graph with many
different node types and many different relation types. An exception is [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which defines
kernels exploiting rich ontological background knowledge.
      </p>
      <p>
        We here present the kernel based on the SUNS framework [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The random
variables represent the likelihood of links where the statistical unit is the subject or object.
Additional features describe aggregated information. Although features are explicitly
calculated, a kernel approach is still preferred since in the applications that we are
considering the number of features can be quite large whereas N , the size of the sample,
can be controlled more easily.
4.1
      </p>
      <p>The Random Variables or Targets in the Data Matrix
6 Don’t confuse a random variable representing the truth value of a statement with a variable in
a triple, representing an object.</p>
      <p>If the machine learning algorithm predicts that a triple is very likely, we can enter this
triple in the semantic graph. We now add columns to the data matrix that provide
additional information for the learning algorithm but which we treat as covariates or fixed
inputs.</p>
      <p>First, we derive simplified relations from the semantic graph. More precisely, we
consider the expressions derived in the last subsection and replace constants by
variables. For example, from (?personA, knows, Jane) we derive (?personA, knows,
?personB) and count how often this expression is true for a statistical unit ?personA, i.e., we
count the number of friends of person ?personA.</p>
      <p>Second, we consider a simple type of aggregated covariate from outside a SUNS.
Consider first a binary triple (?personA, knows, Jane) . If Jane is part of another
binary triple, in the example, (?personA, hasIncome, High) then we form the expression
(?personA, knows, ?personB) ∧ (?personB, hasIncome, High) and count how many rich
friends a person has. A large number of additional covariates are possible but so far we
restricted ourselves to these two types. The matrix formed with the N statistical units
as rows and the covariates as columns is denoted as X and the complete data matrix
becomes the matrix (X, Y ).</p>
      <p>Covariates are of great importance, in particular if statistical units are rather
disconnected. For example, to predict social status of two professors at different universities
in different countries, it might be relevant how many students they administer, but not
exactly which students, or it might be important that they are the dean of some
department, but not of which department. In social network terms: it might be relevant that
they play the same roles.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments with DBpedia Data</title>
      <p>5.1</p>
      <p>
        DBpedia Data
DBpedia [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is part of LOD and contains structured information extracted from Wikipedia.
At the time of writing this paper, it describes more than 3.4 million concepts,
including 312,000 persons, 413,000 places and 94,000 music albums. DBpedia does not only
serve as a “nucleus for the web of data”, but also holds great potential to be used in
conjunction with machine learning approaches. Yet, even though DBpedia already provides
a great value, it is still limited in the information it provides and in terms of quality. For
example, although there are many cities covered in DBpedia, most information, like
its most famous citizens and its most spectacular sights, is not very useful for machine
learning purposes. Here we report results using a population consisting of all members
of the German Bundestag to evaluate our approach. This population has been created
by collecting all triples that are returned by the SPARQL query
      </p>
      <p>SELECT ?s ?p ?o WHERE {
?s ?p ?o .</p>
      <p>?s skos:subject dbp-cat:Members_of_the_German_Bundestag
}
5.2
A great benefit of LOD data is that by one simple SPARQL query the sample is
defined. While DBpedia has great potential for machine learning, there are also
challenges when these machine learning approaches are applied to DBpedia data. The first
issue is related to the problem of incomplete data. It is very common for subjects in
a DBpedia population to share only a subset of predicates. For instance, only 101
of 293 members of the German Bundestag represented in DBpedia have an entry for
the predicate dbp-ont:party or dbp-prop:party. Therefore, in order to
handle DBpedia data, a machine learning algorithm has to be able to deal with missing
or incomplete data. The second issue is related to noisy predicates. For predicates it
is often the case that there are semantical duplicates, e.g. dbp-prop:party and
dbp-ont:party. While duplicate predicates are not a big problem by default, they
can become a challenge when they are used inconsistently, which can greatly increase
the preprocessing effort. Third, even more serious than noisy predicates are noisy
objects. E.g. the Christian Democratic Union of Germany was represented by the
literals "CDU" and "Christian Democratic Union" or the resources dbpedia:
Christian Democratic Union and
dbpedia: Christian Democratic Union (Germany). Thus the true
members of the CDU would have been divided into four distinct subsets and this needs to
be resolved prior to learning. Finally, we have to consider the scale. The sample can get
quite large when all available DBpedia data in a population is used.
5.3</p>
      <p>Predicting Party Membership
In the following experiments the learning challenge was to correctly predict the
political party for each subject, where the party is identified by the object of the predicate
dbp-prop:party. Duplicate predicates would bias the experiments as they are
heavily correlated with the target predicate. Therefore predicates like dbp-ont:party or
dbp-ont:Person/party were removed. Moreover, predicate-object pairs that are
very closely related to a party membership like (?s, skos:subject,
dbp-cat:Politicians of the Social Democratic Party of Germany)
or (?s, rdf:type, yago:GermanGreenPartyPoliticians) were also
removed. Rare features were sometimes pruned. In order to demonstrate the
aforementioned challenges associated with DBpedia data, we conducted the following
experiments
– ORIG: The original data from DBpedia (version 3.5.1). After pruning, this data set
had N = 293 units, i.e., rows and 804 columns.
– DISAMB: In this experiment the objects of the target predicate were manually
disambiguated solving the noisy objects problem. After the disambiguation exactly
one concept (resource) for each party (CDU, CSU, SPD, FDP, Alliance ’90/The
Greens, The Left, Centre Party) remained in the data set. Thus, for each statistical
unit we estimate L = 8 variables. Furthermore, in the original data set only 101
of 293 statistical units had an entry for dbp-prop:party dbp-ont:party.
Since machine learning algorithms benefit from a larger number of examples we
manually added the party for the remaining 192 units. After pruning, this data set
had 802 columns.
– AGE: In this experiment the age of each politician was added as a continuous
feature, by subtracting the birth year (when available) from the year 2010. To prevent
that the age values dominated the remaining columns, age values were normalized.</p>
      <p>After pruning this data set had 804 columns.
– WEIGHT: We used a weighting coefficient of α = 0.4 to put less importance on
the covariates (see Section 3).
– STATE: The predicates dbp-prop:birthPlace or dbp-ont:birthPlace
specify the city or village of birth. For the members with no entry here, we filled
in the entry manually. Naturally, the birthplace is not a useful attribute for our task,
whereas the state of the birthplace can be quite valuable, since in Germany, there are
clear local party preferences. Filling in the state information from the birthplace
information can easily be done by exploiting geographical part-of-relationships with
OWL reasoning.
– TEXT: Finally associated textual information was exploited by tokenizing the
objects of the predicates rdf:comment and dbp-prop:abstract and by adding
one column for each occurring token. When a token was present for a particular
statistical unit, the entry was set to one, else to zero. After pruning the data set had
2591 columns.
– ALL: In this experiment all previously described approaches were combined. Since
the number of attributes changed, we also changed the weighting factor to α = 0.2.
After pruning this data set had 2623 columns.</p>
      <p>
        Except for ORIG, the basis for all experiments was the DISAMB data set. To
evaluate how well the party membership is predicted, we performed leave-one-out
crossvalidation by iterating over all subjects. In each iteration we set all dbp-prop:party
entries for the subject of the current iteration to zero and used predicted estimates for
ranking. As evaluation measures we used NDCG and bpref [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the latter being often
used in TREC tracks designed for evaluation environments with incomplete relevance
data.
      </p>
      <p>Figure 2 shows the results for NDCG and bpref. As expected, the results obtained
from the raw data were worst with a score of 0.722. The effect of data cleaning from
disambiguation improved the score by 7 points. A small improvement in score can be
achieved by adding the age. This shows that age is a weak predictor of party
membership, at least in this Bundestag data set. Furthermore, an improvement in score can be
achieved by putting more weight on the quantity of interest, i.e., the party membership.
The textual description sometimes contains strong hints on party membership and the
score improves to 0.928. The state information is also quite relevant as an input, which
is well explained by the peculiarities of German politics. Finally, quite a high score of
0.963 is achieved by a combination of all methods.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Outlook</title>
      <p>We discussed a kernel approach for learning in semantic graphs. To scale up the
performance to large data sets, we employed the Nystro¨ m approximation. Furthermore, we
orig
disamb
weight
state
text
age
all
6
7
all
state
text
weight</p>
      <p>age
disamb
original
presented a kernel for semantic graphs derived from a local neighborhood of a node
and applied the approach to learning on the RDF-graph of the Semantic Web’s Linked
Open data (LOD).</p>
      <p>To evaluate our approach, we applied it to data extracted from DBpedia. Here the
data is quite noisy and considerable preprocessing is needed to yield good results. Also,
by including textual data the prediction results were considerably improved. This
improvement can already be observed even if a simple keyword based representation is
being used without any sophisticated information extraction. Some of the data
preprocessing steps can easily be executed with ontological (OWL-) reasoning, such as the
generalization from city to state. In fact, materialization of facts derivable from logical
reasoning is recommended as a preprocessing step. Other preprocessing steps, such as
the calculation of age from the birthday and the current date, were done algorithmically.</p>
      <p>
        In the DBpedia experiment, we estimated the membership in the 8 parties for each
member in the Bundestag, thus L = 8. Although some members of the Bundestag
have been in more than one party in their career, the collaborative coupling between the
random variables is not contributing very much to the predictive performance. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
experiments in social networks are described with L = 14425 and a much stronger
collaborative effect. As part of ongoing work we are studying a life-science domain
with several hundred thousand covariates and with L greater than 3000.
      </p>
      <p>Scalability of the overall approach is guaranteed. First, we can control the number
of instances considered in the Nystro¨m approximation. Second we can control the rank
of the approximation. Third, we can control the number of local features that are used
to derive the kernel. In our experiments, M , the number of features, was always quite
high. In this case the most costly computation is the calculation of the kernel requiring
N 2M operations.</p>
      <p>Acknowledgements: We acknowledge funding by the German Federal Ministry of
Economy and Technology (BMWi) under the THESEUS project and by the EU FP 7
Large-Scale Integrating Project LarKC.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Tauberer</surname>
          </string-name>
          , J.: Resource Description Framework, http://rdfabout.com/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In: WWW</source>
          <year>2007</year>
          .
          <article-title>(</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Strube</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Wikirelate! computing semantic relatedness using wikipedia</article-title>
          .
          <source>In: AAAI</source>
          <year>2006</year>
          .
          <article-title>(</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Materializing and querying learned knowledge</article-title>
          .
          <source>In: Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          :
          <article-title>Multivariate structured prediction for learning on the semantic web</article-title>
          .
          <source>In: Proceedings of the 20th International Conference on Inductive Logic Programming (ILP)</source>
          .
          <article-title>(</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>The Semantic Web</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked data - the story so far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vishwanathan</surname>
            ,
            <given-names>S.V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schraudolph</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kondor</surname>
            ,
            <given-names>R.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Graph kernels</article-title>
          .
          <source>Journal of Machine Learning Research - JMLR</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Ga¨rtner, T.,
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flach</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Kernels and distances for structured data</article-title>
          .
          <source>Machine Learning 57(3)</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised learning literature survey</article-title>
          .
          <source>Technical report, Computer Sciences TR 1530 University of Wisconsin Madison</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Stochastic relational models for discriminative link prediction</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          (NIPS*
          <year>2006</year>
          ).
          <article-title>(</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kersting</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
          </string-name>
          , V.
          <article-title>: Multi-relational learning with gaussian processes</article-title>
          .
          <source>In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09)</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Taskar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abbeel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Link prediction in relational data</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          (NIPS*
          <year>2003</year>
          ).
          <article-title>(</article-title>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lodhi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sternberg</surname>
            ,
            <given-names>M.J.E.</given-names>
          </string-name>
          :
          <article-title>Support vector inductive logic programming</article-title>
          . In Hoffmann,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Motoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Scheffer</surname>
          </string-name>
          , T., eds.: Discovery Science, 8th International Conference, DS2005. Volume
          <volume>3735</volume>
          of LNCS., Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Landwehr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passerini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Frasconi: kFOIL: Learning simple relational kernels</article-title>
          .
          <source>In: National Conference on Artificial Intelligence (AAAI)</source>
          .
          <article-title>(</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>D'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fanizzi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Non-parametric statistical learning methods for inductive classifiers in semantic knowledge bases</article-title>
          .
          <source>In: IEEE International Conference on Semantic Computing - ICSC</source>
          <year>2008</year>
          .
          <article-title>(</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seeger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Using the nystrom method to speed up kernel machines</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>13</volume>
          . (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Towards Machine Learning on the Semantic Web</article-title>
          .
          <source>In: Uncertainty Reasoning for the Semantic Web I. Lecture Notes in AI</source>
          , Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Retrieval evaluation with incomplete information</article-title>
          .
          <source>In: Proc. 27th ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>