<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Relation Composition in Large Scale Knowledge Bases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Web Science Research Group</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>University of Mannheim</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Semantic relation composition is a generalized approach for nding conjunctive relation paths in a knowledge base (KB). In semantic web, direct and inverse relationships between entities provide us with ample of explicit knowledge. But there is a plethora of implicit knowledge beyond these direct paths. Consider a knowledge graph, we can achieve deeper insights about a particular entity if we consider the information shared by its neighboring entities via its adjacent relation paths of arbitrary lengths. In this paper, we devise a technique to automatically discover semantically enriched conjunctive relations in a KB. Our approach is generalized for any KB and requires no additional parameter tuning. Particularly, we employ classical rule mining techniques to perform relation composition on knowledge graphs to learn rst order rules. We evaluate our proposed methodology on two state of the art information extraction systems, DBpedia and Yago with promising results in terms of generating high precision rules. Furthermore, we make the rules publicly available for community usage.</p>
      </abstract>
      <kwd-group>
        <kwd>Relation composition</kwd>
        <kwd>Knowledge Bases</kwd>
        <kwd>Rule mining</kwd>
        <kwd>Ontologies</kwd>
        <kwd>Information Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The Semantic Web research has gained a lot of prominence in the last few years.
The large body of work in this area are in themselves an indicator for the
potential of semantic representation. Every real world entity can have a semantic
representation and is associated with a multitude of other entities with
relationships. These relationships de ne association patterns between an entity pair. For
instance, a statement like Porsche was founded in Stuttgart can be semantically
represented by a directed relationship as dbo:foundationPlace(db:Porsche,
db:Stuttgart)1 Also note that, the semantic representation necessarily need
not be via a direct relation, but can be equally stated with an inverse one. This
1 We used the DBpedia [
        <xref ref-type="bibr" rid="ref13 ref3">3,13</xref>
        ] vocabulary here, but it can have any other
semantics from other knowledge sources. The pre xes dbo refer to relation and
concept namespace (http://dbpedia.org/ontology/) and db to instances/resources
(http://dbpedia.org/resource/) in DBpedia. In the rest of the paper we omit these
pre xes for brevity.
set of direct and/or inverse relations to/from a particular entity expresses
explicit knowledge about the entity. But, often we miss out the more interesting,
latent information, just by ignoring the relationship patterns beyond the explicit
ones. For instance, exploring the entity relationships with other entities might
unravel more information. For instance, country(Stuttgart, Germany) makes
it easy to deduce that Porsche has an implicit connection with Germany. Such a
path connecting a source entity (Porsche) with a target one (Germany) is called
a relation path. And the number of transitions we execute from the source to the
target is de ned as hops. In particular, this is a 2-Hop scenario, schematically
represented as ( foundationP lac!e countr!y ).
      </p>
      <p>
        Now, we have two issues. First, a relation path of length n may not be
semantically correct. An arbitrary conjunction of relations might not be useful
or be the best choice. Second, if we nd a semantically valid relation path, there
is a degree of uncertainty associated with their authenticity. This allows us to
formally de ne our problem statement. Let pi be an arbitrary relation in some
KB, K. If we have a conjunctive relation path of length n comprising of n such
KB relations, then we want to nd out if there is another relation h expressing
the similar semantics as the path itself. This is referred to as a semantic relation
composition, keeping close parity with role composition in ontology. Formally,
(p1,.., pi,.., pn) 2 K, 9 h 2 K,
conf : p1 ^
^ pi ^
^ pn
h
(1)
In the rest of the paper we refer to this as a n-Hop rule. These rules are annotated
with conf [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], the con dence score for the rules which indicates the degree
of belief that h is indeed the representative of the conjunctive relation path.
Intuitively, the primary goal of this work is to nd a simpli ed representation of
conjunction of n KB relations. In this context, we envision the applicability of
our contribution in the following use cases:
Semantic Enrichment of Relations Each relation sequence in the body of
a rule (p1 ^ ^ pn in Expression 1 above) represents an obscure connection
between the entities. While the head relation (h in Expression 1) of the rule is
an hypothesis that it might be a simpli ed representation of the conjunction. In
other words, we nd out an alternate (shorter) path of traversing from the source
entity to the target. Hence the name, relation composition. The rational is if we
can nd such paths, then they are semantically equivalent as they connect the
same source-target pair of entities.
      </p>
      <p>Detection of Transitive Relations: Rules composed of recurring relations like
p ^ ^ p ) p can be exploited to extract candidate transitive relations. This
schema detection is possible due to the structure of the derived rules. Especially if
we observe this rule with a varying number of relations in the body, the evidence
of the relation p being transitive is much higher.</p>
    </sec>
    <sec id="sec-2">
      <title>Amendment of the KB with Missing Facts: The extracted rules can be</title>
      <p>used to generate facts missing in the KB by discovering links between entities.
Our target is to generate composed rules with high con dence. These rules can
be used as templates to deduce new KB facts. For instance, if we know with
considerable accuracy that headquarter(a,b) ^ place(b,c) ) location(a,c),
then a set of entities a; b; c 2 K satisfying the rule body can be used to deduced
that location(a,c). The higher the con dence of such a rule, higher is the
condence that the inferred fact is true. However, the higher the con dence, lesser
missing links can be inferred.</p>
      <p>Query Join Improvement: One of the major contribution of this work can
be considered for semantic query performance improvements. Using composed
relations instead of longer chains of relation paths in a query allows lesser number
of joins to be performed. For instance, in terms of SPARQL, the query select ?b
where fdb:IBM dbo:location ?bg will have a faster response time than select ?b
where fdb:IBM dbo:headquarter ?b. ?b dbo:place ?cg.</p>
      <p>In this paper, we make the following contributions, (i) proposing a KB
independent, generalized and parameter tuning free approach to extract composed
form of a set of conjunctive relations (ii) providing a document and ranked list
of con dence annotated rules for the community. In the subsequent Section 2 we
detail our methodology to nd relations which can be composed in a given KB.
We report on our evaluation schemes and experimental setup in Section 3. We
highlight some of the closely related works in Section 4 and nally conclude in
Section 5 with some possible scopes of extending this work.
2
2.1</p>
      <sec id="sec-2-1">
        <title>Methodology</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Rule Extraction</title>
      <p>In this section, we focus on the details of semantic relation composition by
extracting rst-order logic rules from a KB. Since, a KB is essentially a collection
of facts, we provide an excerpt from DBpedia consisting of three facts:
dbo:deathPlace(db:Marek Edelman,db:Warsaw)
dbo:battle(db:Marek Edelman,db:Warsaw Uprising)
(2)
dbo:place(db:Warsaw Uprising,db:Warsaw)
These sample facts are depicted as an knowledge graph in Fig. 1. As observed,
relation deathPlace links Marek Edelman with Warsaw directly. However, the
knowledge graph illustrates a second way of reaching Warsaw starting from
Marek Edelman: The rst hop via relation battle ends in Warsaw Uprising
and a subsequent hop from Warsaw Uprising via relation place nally ends
in Warsaw. This 2-Hop pattern connecting Marek Edelman with Warsaw can be
represented as a general logical conjunction of the individual relation hops as
predicates and their respective entities2 formulated in Equation 3:
battle(ME,WU) ^ place(WU,W)
(3)
2 For better legibility, Marek Edelman is abbreviated with M E, Warsaw Uprising with</p>
      <p>W U and Warsaw with W
db:Marek_
Edelman
dbo:battle</p>
      <p>dbo:place
db:
Warsaw_</p>
      <p>Uprising
dbo:deathPlace</p>
      <p>db:
Warsaw
This is a 2-Hop relation path, since the rst entity can reach the third entity
by applying two hops. This reveals that such a 2-Hop path in the KB from
Marek Edelman to Warsaw can be transformed to a rst-order logic representation
by introducing the direct path via deathPlace:</p>
      <p>battle(ME,WU) ^ place(WU,W) ) deathPlace(ME,W)
In this rule the logical conjunction of the 2-Hop path is the body of the rule and
the relation deathPlace is the head of the rule.</p>
      <p>In this example Warsaw Uprising serves as the join entity enabling the
semantic composition as it appears in the range of the rst relation and in the
domain of the subsequent relation. A transformation of this rule containing
explicit entities of our KB to a more general rule can be achieved by replacing all
literals of the predicates with variables ei:</p>
      <p>battle(e0; e1) ^ place(e1; e2) ) deathPlace(e0; e2)
We should note that the rule above might still appear to be a candidate rule if
deathPlace is replaced with birthPlace. We will discuss shortly that, we do not
lter o any rule manually, rather we let the evidence based rules to determine
its accuracy. The semantically weaker rule patterns are automatically weighted
lesser than its stronger counterparts. With this simple example, we framed the
notion of creating a rst-order logic rule with semantic relation composition
by exploiting conjunctive relation paths. We now consider a general knowledge
graph structure (shown in Fig. 2) for formulating an abstract pattern for rules.
As the head entity e0 is linked to the tail entity en with n-Hops over relations
p1; :::; pn and both entities can be connected either directly by pdir(e0; en) or
inversely by pinv(en; e0), the two general and formalized rst-order logic rules
can be extracted as :
p1(e0; e1) ^ p2(e1; e2) ^ ::: ^ pn(en 1; en) ) pdir(e0; en)
p1(e0; e1) ^ p2(e1; e2) ^ ::: ^ pn(en 1; en) ) pinv(en; e0)
For rule extraction either one of these two patterns linking the head and tail
entity can be used. These can be considered as templates for rule extraction. For
a knowledge base, K and n relation hops between head and tail entity, the set
(4)
(5)
(6)
Head entity
p1</p>
      <p>p2
e1
pn-1</p>
      <p>pn
en-1
pdir
…….
pinv</p>
      <p>en
Tail entity
RKn contains all extractable n-Hop rules:</p>
      <p>
        RKn = fp1 ^ p2 ^ ::: ^ pn ) h jpi; h 2 Kg
(7)
Hence, the example rule in Equation 5 is a concrete element of the set RD2BP edia.
Our goal is to generate these sets for varying values of n. In order to generate
actual instances of such rules, we use SPARQL over public KB endpoints. The
subsequent part describes a query template for a KB in RDF [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with the
SPARQL query language to extract KB structures depicted in g. 2. Note that
the following query is only pseudo-SPARQL.
      </p>
      <p>PREFIX kb : &lt; knowledgebase /&gt;
SELECT ? p1 ? p2 ... ? pn ? pdir COUNT (? pdir )
WHERE {
? p1 a kb : prop ... ? pn a kb : prop . ? pdir a kb : prop .
? e0 ? p1 ? e1 . ? e1 ? p2 ? e2 ... ?e[n -1] ? pn ? en .</p>
      <p>? e0 ? pdir ? en
}
GROUP BY ? p1 ? p2 ... ? pn ? pdir
ORDER BY ? p1 ? p2 ... ? pn ? pdir</p>
      <p>Listing 1.1: Main SPARQL query template for extracting n-Hop rules
In Listing 1.1, we de ne a pre x to refer to the current knowledge base. The
W HERE part describes the basic graph pattern that matches the structure in
g. 2. Hence, the relations p1; :::; pn from the body of the rule are described by
variables ?p1; :::; ?pn which are explicitly tagged with type kb:prop. The second
part consisting of n triples of the form ?ei ?pi+1 ?ei+1 formulates the join
between the entities that are connected over the relations pi in a sequential fashion.
Finally the triple ?e0 ?pdir ?en directly connects the head and tail entity with
relation ?pdir. For additionally discovering rule patterns connecting the head
entity e0 with the tail en inversely, the previously described statement ?e0 ?pdir ?en
must be substituted with ?en ?pinv ?e0 by just ipping head and tail entity. The
Person
birthPlace</p>
      <p>PopulatedPlace</p>
      <p>Person
playsForTeam</p>
      <p>SportsClub
domain
range
domain
range
grouping of this graph by all the n-Hop path relations and the head relation
allows to aggregate the instances of each rule and nally to select the number of
instances which ful ll a speci c rule. With varying n, we can generate speci c
rules satisfying Equation 6.</p>
      <p>An easy way to extract rules from a KB for a speci c relation list P is to build
up queries with explicit relations for the n-Hop relation path instead of variables
as in Listing 1.1 and only extract possible heads of a rule. This algorithm has a
runtime of O(jP jn), where n is the hop length. To speed up the semantic relation
composition, the algorithm avoids querying the knowledge base for n-Hop
relation combinations p1; :::; pn which do not have joining concepts. It is evident from
Equation 6, 4, that the conjunction of two relations is possible due to the join
entity which co-occurs as object in the relation pi 1 but as subject in pi. We use this
feature to enhance our query time. We perform a simple check for dom(pi) and
ran(pi 1), which respectively maps relation pi to its domain concept and relation
pi 1 to its range concept. We present this in Figure 3, which shows the
phenomena for a 2-Hop relation composition birthPlace(e0; e1) ^ playsForTeam(e1; e2)
with their corresponding domain and range concepts. Obviously, the algorithm
prunes this combination because PopulatedPlace 6v Person. Since there is no
entity which has the mutually exclusive ontological concepts PopulatedPlace
and Person at the same time, no joining entity would be found enabling this
relation composition and thus a pruning is appropriate.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Rule Cycles</title>
      <p>In this semantic relation composition approach where direct relations are
expressed with conjunctive relations paths, cycles between entities on the path
potentially occur. Figure 4 depicts a general knowledge graph structure
containing a cycle in the relation path from head to tail entity. Entities ei and ej are
linked by (j i) relation hops and inversely connected over relation pcyc. This
knowledge graph cycle can be included arbitrary many times in the body of the
rule. In order to avoid capturing cycles within the extracted rules, all entities
e0; :::; en on the path must be pairwise distinct, el 6= em for l 6= m. Subsequently,
to ensure that this restriction is ful lled and cycles are prohibited by excluding
recurring entities, the following lter conditions are attached to the previously
speci ed SPARQL query template:
FILTER (! regex (? e0 , str (? e1 )))
...</p>
      <p>ei
ej
...</p>
      <p>en-1</p>
      <p>en
...
pcyc
pdir
....</p>
      <p>FILTER (! regex (? e0 , str (? en )))
FILTER (! regex (? e1 , str (? e2 )))
...</p>
      <p>FILTER (! regex (? e1 , str (? en )))
...</p>
      <p>FILTER (! regex (? e[n -1] , str (? en )))</p>
      <p>Listing 1.2: SPARQL lters for the avoidance of cycles
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Rule Metrics</title>
      <p>
        In the introductory section, we mentioned two major sub problems. In this
section, we discuss ways to quantify each of the rst-order logic rules learnt from
the KB. Often, there were scenarios where we observed very little evidence in
support of some candidate semantic relation conjunctions. This motivated us
to quantify each rule. As the rst step, we gathered the evidences or the exact
number of instances we observed the rule to be true. Each rule instance
corresponds to a ground rule and counts as one evidence in favor of the general rule.
Furthermore, to assign a normalized score to each rule, we employed association
rule mining technique and metrics to this end. In particular, we compute the
support and con dence [
        <xref ref-type="bibr" rid="ref11 ref16">11,16</xref>
        ] for each rule.
      </p>
      <p>Obviously the straightforward way of learning a weight for a rule revealing
the amount of evidence for this rule in the KB, is to count the number of (n + 1)
valued tuple (e0; :::; en) evaluating the n-Hop rule r 2 RKn to true. Therefore, we
formally de ne the following set Instr:</p>
      <p>Instr = f(e0; :::; en)jr : p1(e0; e1) ^ p2(e1; e2) ^ ::: ^ pn(en 1; en) ) hg
(8)
An element of the instance set of our introducing 2-Hop example consists of three
valued tuple (Marek Edelman,Warsaw Uprising,Warsaw) depicted in Fig.1 since
it evaluates the rule in Equation 5 to true. The weight Wr of a rule r is de ned
as the number of instances ful lling the rule, i.e. cardinality of Instr :
Wr = jInstrj
(9)
The next rule metric we use is support, denoted by supp(r), which speci es the
number of instances of a rule r obtained among all rule instances of all rules in</p>
      <p>Rule
knowsPerson(e0; e1) ^ knowsPerson(e1; e2) ) knowsPerson(e0; e2)</p>
      <p>isMarriedTo(e0; e1) ^ hasChild(e1; e2) ) hasChild(e0; e2)
affiliatedTo(e0; e1) ^ affiliatedTo(e1; e2) ) affiliatedTo(e0; e2)
RSL(rk,ri)
the KB. Therefore supp(r) allows to compare the weight of a rule r with all the
other weights and essentially represents a relative weight. Formally,
And con dence, conf (r), of a rule r 2 RKnB is the conditional probability that
the head of the rule occurs under the condition that the body of the rule occurred
already. The two functions Body(r) and Head(r) refer to the body and the head
of the rule, respectively:
conf (r) =
supp(Body(r) \ Head(r))</p>
      <p>supp(Body(r))
This implies that the more often entity triples (e0; e1; e2) which ful ll the
2Hop relation conjunction battle(e0; e1) ^ place(e1; e2) ful ll also the predicate
deathPlace(e0; e2), the higher the con dence for the rule of Equation 5. We must
note that the con dence is dependent on the number of evidences satisfying the
rule or in a way the size of the KB. Our goal is to generate highly accurate
rules and its composition and it is a rational choice to weigh such rules based on
observed evidence. More often we see a rule pattern to hold true in a KB, more
con dent we are. In our situation, we necessarily do not consider other measures
to consider rules which might be semantically correct but lack su cient instances
to prove their correctness.</p>
      <p>The metrics introduced so far cater to individual rules only. But, we might
be interested to know a set of rules likely to hold true under a given condition.
We introduce the Rule Similarity Likelihood (RSL), a conditional probability
indicating rule r2 to hold given that rule r1 holds. Formally,</p>
      <p>RSL(r1; r2) = P (r2jr1) = jInstr1 \ Instr2 j</p>
      <p>Wr1
This is also a measure of semantic similarity between two rules r1; r2 2 RKn .
The corresponding instance sets Instr1 and Instr2 are compared by computing
the overlapping amount with a conditional probability. Intuitively, the higher
the number of rule instances satisfying r1 and rule r2, the higher the RSL for
these rules. Table 1 illustrates an example top-3 list of the RSL given that the
following rule, rk, holds:
knows(e0; e1) ^ knows(e1; e2) ) knows(e0; e2)
(10)
(11)
(12)
3.1</p>
      <sec id="sec-5-1">
        <title>Experiments</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Datasets</title>
      <p>
        For evaluation and experiments, we chose two state of the art knowledge bases.
Yago : A KB which extracts facts from Wikipedia. Moreover it integrates from
other knowledge sources like the lexical database WordNet [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and the
geographical database GeoNames3. For our experiments we used only the yagoF acts
package from the CORE theme. This collection consists of about 5.5 Mio. facts
build up by 36 relations.
      </p>
      <p>DBpedia : A popular KB which also extracts structured information from
Wikipedia. The extraction patterns therefore harvest facts from Wikipedia info
boxes and exploit the underlying structure to generate high quality knowledge.
We used the English version 3.9 consisting of about 580 Mio. facts and a ltered
subset of 672 relations relevant for our rule extraction approach.
These two KBs are state of the art IE systems and considered large with respect
to the information content. We choose these for their easy availability, wide usage
and large number of relations. Furthermore, these are well structured KBs which
allows queries over public endpoints (DBpedia) or data bases (Yago).
Rule Set: We have made the learnt rules publicly available for research use4. It
is associated with a README le and rules for both DBpedia and Yago.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Annotation Scheme</title>
      <p>For both the KBs we restrict our extraction and evaluation to 2-Hop and 3-Hop
rst-order logic rules. The top-k rules with descending con dence for each KB
and for each n = 2; 3 were annotated manually, allowing a computation of the
precision at k (p@k). In Expression 7 we de ned the complete set of n-Hop rules.
For the evaluation purpose, we decided to prune these sets based on a minimum
weight. This means, from a selected set of weight thresholds of W = 1; 100; 1000,
we select only those rules from RKn which have at least weight W . Any rule with
a lower weight is rejected. Such a set is denoted as RKn;W .</p>
      <p>For each experiment setting (di erent hops), three top-k lists are generated
with the di erent weight thresholds. Top-k rule lists with a low W threshold
usually have the top rules with high con dence values but low weights. These
are often annotated as false since they are too speci c to a particular context
and cannot be considered a general rule pattern. The di erent weight thresholds
enable a better analysis of the in uence of the weight for the rule quality. For
each top-k list we annotated the rst 100 rules manually. Every one of those rules
were presented to the annotator with additional example instances as evidences
for the rule. Ones which were incorrect were marked so. The following rule is an
example for an incorrect marked rule:
canonizedPlace(e0; e1) ^ city(e1; e2) ) deathPlace(e0; e2)
(14)
3 http://www.geonames.org/
4 http://web.informatik.uni-mannheim.de/adutta/SRL.tar.gz</p>
      <p>Rule
isMarriedTo(e0; e1) ^ hasChild(e1; e2) ) hasChild(e0; e2)
dealsWith(e0; e1) ^ isLocatedIn(e1; e2) ) isLocatedIn(e0; e2)</p>
      <p>dealsWith(e0; e1) ^ dealsWith(e1; e2) ) dealsWith(e0; e2)
hasChild(e0; e1) ^ isPoliticianOf(e1; e2) ) isPoliticianOf(e0; e2)
hasChild(e0; e1) ^ isAffiliatedTo(e1; e2) ) isAffiliatedTo(e0; e2)
W</p>
      <p>conf supp
Table 2 shows the top-5 rules of RY2 AGO;100. Notice that even though the rst
rule appears to be a very good rule in terms of its semantics, this top rule has
only a con dence value of 0.57 and the con dence already dropped to 0.18 for the
15th rule. This rapid falling of the con dence values from the rules is continued
to the 130th rule of the top-k list having only a con dence value of 1% left. Table
3 shows the top-5 rules of RD2BP edia;100. The con dence values are very high with
a top con dence of 100% and decrease slowly in comparison with Yago 2-Hop
rules. However, these top-5 rules have very similar semantics since all of them
have the form p(e0; e1) ^ country(e1; e2) ) country(e0; e2). Such a rule list is
the expected outcome since spatial as well as part-of relationships are strongly
present in a KB like DBpedia. Thus, these rules have not only high evidence in
the KB expressed by high weights but also high con dence values. Most of the
shown rules describe part-of relations for di erent regions in di erent countries
like federal states, arrondissements, cantons and counties.</p>
      <p>Figure 5 (I) shows the p@k graphs for several top-k rule lists for RY2 AGO;W
with the three di erent weight thresholds denoted by w1; w100; w1000.
However, as visualized in the graph there are two di erent plots for W = 1, namely
w1:hasGender and w1, respectively. After the initial experiments of YAGO
during the annotation, we noticed that the relation hasGender is contained in the
resulting rules. For the following evaluation we discarded rules containing this
relation from the several top-k lists. But to give an impression of the quality
of the rule list including this property, we annotated one top-k list including
hasGender with weight threshold W = 1 for comparative reasons. The top-k
list with weight threshold W = 1000 has very high p@k values for small k. This
0.8
0.7
is well-founded by the fact that the top rules have high con dence values plus
high weights which produces more general and subsequently more true rules.
The higher the weight threshold, the higher the precision of the rules. Figure 5
(III),(IV) shows the p@k graphs for the three weight thresholds for RD2BP edia. In
the data sets we di erentiated between heads pre xed by dbo5 and dbp6. Figure
5 (II) illustrates the p@k for Yago 3-Hop rules. In general, the precision for
rules extracted from DBpedia is higher than for Yago . For each rule set with
a weight threshold W the number of extracted rules are illustrated in Table 4. In
addition, the total number of rules with inverse heads is shown. The number of
extracted 3-Hop rules from DBpedia is very low since we not nished analyzing
DBpedia completely. For brevity, we omitted the remaining 3-Hop results but
in general the precision is lower in comparison with that of 2-Hop rules.</p>
      <sec id="sec-7-1">
        <title>5 http://dbpedia.org/ontology/</title>
        <p>
          6 http://dbpedia.org/property/
W = 1
W = 100
W = 1000
W = 1(inv:)
Anyanwu et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] propose an approach for the ranking of complex relationship
search results on the Semantic Web called SemRank. The three presented
Semantic Associations between two entities are composed of property sequences
linking them semantically similarly to our n-Hop connection of head and tail
entity. They introduce metrics to evaluate these di erent Semantic Associations
holding between two entities of interest allowing nally an ordering of search
results. The key feature of the ranking is the concept of predictability of a result
which can be modulated and tuned by the user. The predictability is composed
of the uniqueness of a result and the amount of deviance or refraction of a result
path from the paths represented in the ontology schema.
        </p>
        <p>
          A related work on discovering relation sequences between two entities of
interest is the RelF inder proposed in [
          <xref ref-type="bibr" rid="ref14 ref9">9,14</xref>
          ]. This approach also focuses on
exploiting extensive knowledge bases for revealing interesting semantic
relationships between entities. Since exploring knowledge bases visually by knowledge
graph representation can be very complicated on account of the vast number of
entities and corresponding links between them, they propose an approach which
simpli es the exploration process. Similarly to the proposed notion in this paper,
they not only seek for direct relations between the two entities of interest but
also for multiple hop relations.
        </p>
        <p>
          Lao et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] introduce the Path Ranking Algorithm (PRA) for probabilistic
inference in large KBs. This algorithm enables a generation and prediction of
new facts by executing random walks on the knowledge graph. PRA ranks graph
nodes relative to a query node representing the probability that these two nodes
are in fact semantically associated. The ranking of a single node is achieved by
combining the rankings of the node from di erent paths in a linear model where
each path feature is weighted.
        </p>
        <p>
          Fleischhacker et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] devise a technique to automatically reveal
underlying axioms of relations by applying association rule mining on RDF data. This
approach is able to detect property axioms like subsumption and disjointness,
transitivity, symmetry and functionality based on con dent association rules.
The property axioms are generated by creating transaction tables containing the
actual instances and their respective relations. For example, to detect symmetric
properties the transaction table contains information about direct and inverse
relations holding between entities simultaneously. These transaction tables are
mined for rules which are eventually converted to actual property axioms.
        </p>
        <p>
          Another approach described by Abedjan et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] focuses on enhancement
and enrichment of RDF data by employing association rule mining similarly to
our approach. This rule-based approach enables predicate suggestion
supporting users when creating new data entries, enrichment and amendment of the
KB with missing facts and ontology schema improvement. Since the foundation
for rule mining are facts in the SPO triples representation, they achieve the
extraction of rules for di erent use cases by changing the context and target
of mining. For example, in one con guration they mine subjects in the context
of predicates. The resulting association rules hold between subjects with same
predicates like Merkel ! Obama since they are both persons and politicians and
enable a clustering of similar subjects.
5
        </p>
        <sec id="sec-7-1-1">
          <title>Conclusion &amp; Future Work</title>
          <p>Our empirical results prove that our proposed approach is in general able to
extract semantically consistent rules from a knowledge base. This approach is
strong in identifying typical relation hierarchies in knowledge bases. However, by
just considering the con dence without including the weight into the
examination, the quality of the rules is generally low. Thus, discovering good semantic
relation compositions from a KB is accomplished by incorporating high thresholds
for rule weights and con dence values. An appropriate weight threshold strongly
depends on the used knowledge base but as empirically seen, for W 100 we
achieved good results. Furthermore, the quality of 2-Hop rules is higher than the
quality of 3-Hop rules. For growing n we therefore assume a rapid drop in the
quality of the rules, due to greater inclusion of noisy rules.</p>
          <p>
            A major future work would be to extend the approach for other KBs.
Therefore, experiments with the community created Knowledge Graph in Freebase
[
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] will be introduced to provide more variety of experimental rule sets and
to adapt the implementation to MQL7. For enriching our rule extraction, we
also consider other knowledge sources like ConceptNet8 and Cyc9. Since Yago,
DBpedia and Freebase are structured IE systems, we would also apply our
methodology on unstructured sources, which focus on probabilistic KBs,
typically Open Information Extraction (OIE) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] systems (Nell [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], Reverb [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]).
We expect to face more challenges with OIE, since the entities are often not
typed but free form texts, hence, entity linking and disambiguation would be an
initial solution.
          </p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>7 https://developers.google.com/freebase/v1/mql-overview 8 http://conceptnet5.media.mit.edu 9 http://www.cycfoundation.org</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ziawasch</given-names>
            <surname>Abedjan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Felix</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <article-title>Improving rdf data through association rule mining</article-title>
          .
          <source>Datenbank-Spektrum</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <volume>111</volume>
          {
          <fpage>120</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Kemafor</given-names>
            <surname>Anyanwu</surname>
          </string-name>
          , Angela Maduko, and
          <string-name>
            <given-names>Amit</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Semrank: ranking complex relationship search results on the semantic web</article-title>
          .
          <source>In Proceedings of the 14th international conference on World Wide Web</source>
          , pages
          <volume>117</volume>
          {
          <fpage>127</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and
          <string-name>
            <given-names>Zachary</given-names>
            <surname>Ives</surname>
          </string-name>
          .
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Bollacker</surname>
          </string-name>
          , Colin Evans, Praveen Paritosh, Tim Sturge, and
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Taylor</surname>
          </string-name>
          . Freebase:
          <article-title>a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>1247</volume>
          {
          <fpage>1250</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Carlson</surname>
          </string-name>
          , Justin Betteridge, Bryan Kisiel, Burr Settles,
          <string-name>
            <surname>Estevam R Hruschka Jr</surname>
          </string-name>
          , and Tom M Mitchell.
          <article-title>Toward an architecture for never-ending language learning</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>5</volume>
          , page 3,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Oren</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Fader</surname>
          </string-name>
          , Janara Christensen, Stephen Soderland, and
          <string-name>
            <surname>Mausam</surname>
            <given-names>Mausam</given-names>
          </string-name>
          .
          <article-title>Open information extraction: The second generation</article-title>
          .
          <source>In IJCAI</source>
          , volume
          <volume>11</volume>
          , pages
          <fpage>3</fpage>
          {
          <fpage>10</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Fader</surname>
          </string-name>
          , Stephen Soderland, and
          <string-name>
            <given-names>Oren</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Fleischhacker</surname>
          </string-name>
          , Johanna Volker, and Heiner Stuckenschmidt.
          <article-title>Mining rdf data for property axioms</article-title>
          .
          <source>In On the Move to Meaningful Internet Systems: OTM</source>
          <year>2012</year>
          , pages
          <fpage>718</fpage>
          {
          <fpage>735</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Heim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Hellmann</surname>
          </string-name>
          , Jens Lehmann, Ste en Lohmann, and
          <string-name>
            <given-names>Timo</given-names>
            <surname>Stegemann</surname>
          </string-name>
          .
          <article-title>Rel nder: Revealing relationships in rdf knowledge bases</article-title>
          .
          <source>In Semantic Multimedia</source>
          , pages
          <volume>182</volume>
          {
          <fpage>187</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Graham</given-names>
            <surname>Klyne</surname>
          </string-name>
          and
          <string-name>
            <surname>Jeremy J Carroll</surname>
          </string-name>
          .
          <article-title>Resource description framework (rdf): Concepts and abstract syntax</article-title>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Lai</surname>
          </string-name>
          and
          <string-name>
            <given-names>Narciso</given-names>
            <surname>Cerpa</surname>
          </string-name>
          .
          <article-title>Support vs. con dence in association rule algorithms</article-title>
          .
          <source>In Proceedings of the OPTIMA Conference</source>
          , Curico,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ni</surname>
            <given-names>Lao</given-names>
          </string-name>
          , Tom Mitchell, and William W Cohen.
          <article-title>Random walk inference and learning in a large scale knowledge base</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>529</volume>
          {
          <fpage>539</fpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jens</surname>
            <given-names>Lehmann</given-names>
          </string-name>
          , Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
          <string-name>
            <surname>Pablo N Mendes</surname>
            ,
            <given-names>Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Hellmann</surname>
          </string-name>
          , Mohamed Morsey, Patrick van Kleef,
          <source>Soren Auer</source>
          , et al.
          <article-title>Dbpedia{a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Jens</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , Jorg Schuppel, and Soren Auer.
          <article-title>Discovering unknown connectionsthe dbpedia relationship nder</article-title>
          .
          <source>CSSW</source>
          ,
          <volume>113</volume>
          :
          <fpage>99</fpage>
          {
          <fpage>110</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>George</surname>
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <volume>39</volume>
          {
          <fpage>41</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pang-Ning Tan</surname>
            and
            <given-names>Vipin</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Chapter 6. association analysis: Basic concepts and algorithms</article-title>
          . Introduction to Data Mining.
          <source>Addison-Wesley. ISBN</source>
          ,
          <volume>321321367</volume>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>