<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Query-Time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ndapandula Nakashole,</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabian Suchanek</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Sozio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Theobald</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institut Mines-Telecom; Telecom ParisTech;</institution>
          ,
          <addr-line>CNRS, Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Max-Planck Institute for Informatics</institution>
          ,
          <addr-line>Saarbr u ̈cken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Max-Planck Institute for Informatics</institution>
          ,
          <addr-line>Saarbru ̈ cken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Max-Planck Institute for Informatics</institution>
          ,
          <addr-line>Saarbru ̈ cken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>Recent advances in information extraction have paved the way for the automatic construction and growth of large, semantic knowledge bases from Web sources. However, the very nature of these extraction techniques entails that the resulting RDF knowledge bases may face a significant amount of incorrect, incomplete, or even inconsistent (i.e., uncertain) factual knowledge, which makes efficient query answering over this kind of uncertain RDF data a challenge. Our engine, coined URDF, augments first-order reasoning by a combination of soft rules (Datalog-style implications), which are grounded in a deductive fashion in order to derive new facts from existing ones, and hard rules (mutual-exclusiveness constraints), which enforce additional consistency constraints among both base and derived facts. At the core of our approach is an efficient approximation algorithm for this constrained form of the weighted MaxSAT problem with soft and hard rules, allowing us to dynamically resolve inconsistencies directly at query-time . Experiments on real-world and synthetic data confirm a high robustness and significantly improved runtime of our framework in comparison to state-of-the-art MCMC techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The recent advent of information extraction techniques has
enabled the automatic construction and growth of large, semantic
knowledge bases from Web sources. Knowledge bases such as
DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], YAGO [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], Freebase.com, and TrueKnowledge.com
consist of many millions or even billions of facts, which are usually
captured in the form of RDF-style subject-predicate-object (SPO)
triples. Moreover, the Linked-Data initiative (LinkedData.org)
encompasses these and many other RDF datasets, along with
extensive cross-linkage in the form of owl:sameAs properties between
entities in different data sources. For high coverage of entities
and their properties, it is natural to use automated, often heuristic
or probabilistic, methods to populate these knowledge bases, and,
VLDS’12, August 31, 2012. Istanbul, Turkey. Copyright c 2012 for the
individual papers by the papers’ authors. Copying permitted for private and
academic purposes. This volume is published and copyrighted by its editors
perhaps, also to automatically establish owl:sameAs links.
Consequently, these data sources may contain a significant fraction of
noise and incorrect triples. However, even if we accept such data
errors and inconsistencies, no knowledge base can ever be
complete, and even the entire Linked-Data cloud can hardly ever cover
all interesting properties of relevant entities.
      </p>
      <p>
        Research on knowledge base construction has adopted
probabilistic models and statistical learning for cleaning the gathered
pool of raw fact candidates (typically extracted from Wikipedia
and textual or semi-structured Web pages). A powerful instrument
to this end is reasoning with consistency constraints (see, e.g., [
        <xref ref-type="bibr" rid="ref16 ref23 ref26 ref6 ref7">6,
7, 16, 23, 26</xref>
        ]). For example, specifying that a person can have
only one spouse (at a given snapshot in time) helps
distinguishing marriages from romantic affairs, and removing false
hypotheses for isMarriedTo triples. On the other hand, keeping only those
triples with the highest confidence (or likelihood) of being correct,
is an unduly eager and conservative approach. For example, when
searching for musician couples where both partners have won a
Grammy award, correct answers, such as John Lennon and Yoko
Ono, may well be composed out of low-confidence input triples,
as the join predicates in the query impose additional restrictions
and could implicitly “de-noise” the answers. For example, the
Lennon/Ono marriage is not in any of the Wikipedia infoboxes, and
respective extractions from free text should have lower confidence.
      </p>
      <p>When query answers are empty because critical pieces of
knowledge are missing, reasoning over uncertain facts can be helpful to
produce—at least—likely or speculative results. To this end,
deduction rules are an instrument to infer answers that go beyond the
extensionally represented content of the knowledge base. These
rules themselves may be uncertain as well. For example, suppose
we want to find the doctoral advisor of Alon Y. Halevy. Often
(but not always) the senior author on the first few papers of a
researcher is the doctoral advisor. Based on such a rule, we could
deduce that Yehoshua Sagiv is Halevy’s advisor. Moreover,
deduction rules cover a wide range of RDF/S and OWL-based
reasoning concepts, such as the owl:TransitiveProperty of predicates (e.g.,
for the rdfs:subClassOf property or over owl:sameAs links), which
lie at the expressive intersection of Datalog-style deductive
reasoning and OWL-DL. Additional consistency constraints, on the other
hand, cover OWL concepts such as the owl:FunctionalProperty or
owl:disjointWith properties of predicates and classes.</p>
      <p>In summary, we emphasize the following desiderata for
querytime reasoning over uncertain RDF triples:
1) to give answers to complex SPARQL queries over triples with
highly varying confidence values;
2) to overcome the incompleteness problem, exploit deduction rules,
which may themselves be uncertain, and to infer answers even
if some critical triples are missing;
3) to counter amplified noise and keep query-result precision high,
take into account consistency constraints in the specific context
of a user query;
4) achieve all of the above with high efficiency, so that queries can
be answered with interactive response time of a few seconds.
Our aim with URDF is to address the above desiderata in an
integrated manner. Our implementation is based on top-down,
ondemand grounding of rules expressed in first-order logic, together
with a constrained form of weighted MaxSAT solving.
Considering hard constraints jointly with MaxSAT reasoning over
propositional clauses poses additional challenges; to our knowledge, these
have not been addressed in prior work for interactive, query-time
reasoning.</p>
    </sec>
    <sec id="sec-2">
      <title>DEFINITIONS AND NOTATIONS</title>
      <p>We are given a set X of Boolean variables, each variable taking
either the value true or false. The negation of a variable x 2 X
(denoted as x), has the value true if and only if x is assigned false.
We shall refer to a variable x and its negation x as a literal. A
Horn clause C is a set of literals containing at most one positive
literal. Given a truth assignment to variables, we shall say that a
Horn clause is satisfied if it contains at least one literal whose value
is true (clauses are assumed to be in disjunctive form). Every clause
C is associated with a positive weight w(C) 2 R. A Horn formula
is defined as a conjunction of Horn clauses (and hence Horn
formulas are in conjunctive normal form, CNF). A Horn formula is
satisfiable if there is a truth assignment to all literals such that all
clauses are satisfied. As we deal with Horn formulas that might not
be satisfiable, we seek to find a truth assignment that maximizes
the total weight of satisfied clauses. An example of a Horn formula
is the following
(x1 _ x2) ^ (x2 _ x3) equiv. to (x1
x2) ^ (x3
x2);
where denotes logical implication.</p>
      <p>Given a set of relation types R and a set E of entities, a fact f
is defined as a triplet of the form f = (e1; e2; r), which expresses
an instance of a binary relation of type r 2 R for two entities
e1; e2 2 E (i.e., we could denote the fact that “John is married to
Yoko”, where John and Yoko are both entities). Moreover, facts
may be uncertain. Hence every fact f is also associated with a
positive weight w(f ) 2 R, expressing the degree of confidence in the
fact being correct. Moreover, every fact is also associated with a
Boolean variable xf 2 X, whose value indicates whether the
corresponding fact is true or false. In the following, we shall simplify
the notation and do not distinguish between variables and facts
anymore. Hence, assigning the value true to a fact f corresponds to
assigning true to the corresponding variable xf .</p>
      <p>We define a knowledge base KB as a triplet KB = fF ; C; Sg,
where F is a set of facts, C is a set of Horn clauses, and S is a
collection of disjoint sets of facts. We shall refer to C and S as soft
deduction rules and hard consistency constraints, respectively (or
soft and hard rules for short).</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Soft Deduction Rules</title>
      <p>We consider weighted Horn clauses with exactly one positive
head literal as soft rules. A soft rule could state, for example, that
“if Yoko and John are married and John lives in NY, then also Yoko
lives in NY”, with a confidence of 0.38 of being correct, which we
write as follows:
lives(Yoko; NY )
married(Yoko; John) ^ lives(John; NY )[0:38]</p>
      <p>To tackle the inherent incompleteness of F , we lift soft rules into
first-order rules with universally quantified variables, which serve
as input to our knowledge base. Soft rules then have the shape
of first-order Horn clauses and can be written as Datalog-style
implications. To express the first-order rule that “married couples
usually live in the same place”, for example, we use the following
compact notation:
livesIn(p1 ; loc1 )</p>
      <p>marriedTo(p1 ; p2 ) ^ livesIn(p2 ; loc1 )[0:38]</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Hard Consistency Constraints</title>
      <p>The set of facts F may contain inconsistent information. Hence,
we introduce hard consistency constraints that take the shape of a
collection of disjoint sets of mutually-exclusive facts S = S1; : : : ;
SjSj. We enforce the constraint that for each hard rule Sk, at most
one fact f 2 Sk may be assigned the value true. For example, if
we observe two or more different birth dates for a person, clearly
something went wrong either during extraction or when reasoning
with soft rules. We can formally identify such an inconsistency by
formulating a consistency constraint as follows:
(date1 = date2 )</p>
      <p>bornOn(p1; date1) ^ bornOn(p1; date2)
Grounding this hard rules could, for example, then yields the
following set of mutually exclusive facts
fbornOn(John; 1931 ); bornOn(John; 1940 ); bornOn(John; 1957 )g
which specifies that John could be born either in 1931, 1940, 1957,
or in none of the above years. In contrast to soft rules, hard rules
may not be violated by any truth assignment to the corresponding
Boolean variables.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Problem Definition</title>
      <p>As we deal with Horn formulas that might not be satisfiable, we
seek to find a truth assignment that maximizes the total weight of
the satisfied clauses. We call this problem weighted MaxSAT with
soft and hard rules which we formally define as follows.
We are given a set of facts F = ff1; f2; : : : ; fng, a collection C =</p>
      <sec id="sec-5-1">
        <title>C1; : : : ; Cm of clauses in Horn form (soft rules), and a collection</title>
        <p>of sets S = S1; : : : ; St that partition F (hard rules). Each clause</p>
        <sec id="sec-5-1-1">
          <title>C 2 C is associated with a positive weight w(C). We wish to find</title>
          <p>a truth assignment to each variable f 2 F such that for each set
in S at most one fact is assigned the value true, and the sum of the
weights of the satisfied clauses is maximized.</p>
          <p>
            Given a knowledge base KB = fF ; C; Sg (in grounded form),
we define an instance of the MaxSAT problem with soft and hard
rules as follows. Every fact fi 2 F is associated with a Boolean
variable. In addition, we introduce a unit clause Ci = ffig, whose
weight is equal to the confidence of the corresponding fact. For
convenience of notation, for each fact fi, we also introduce a unit
hard rule Si = ffig into S. As the MaxSAT problem with hard
and soft rules is a generalization of the classic MaxSAT problem
with Horn clauses, which is NP-Hard [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], it follows that also our
problem is NP-Hard. Because of the intractability to compute an
optimum solution for the above problem, we resort to devise an
approximation algorithm.
          </p>
          <p>We remark that instead of hard rules, one could also enforce
consistency constraints by introducing soft rules with high weights.
However, in combination with an approximation algorithm, this
approach may involve non-trivial issues as illustrated by the following
example. Consider the following two facts bornOn(John; 1931 )
and bornOn(John; 1940 ), whose weights are 0 and w 0,
respectively. In order to enforce the hard rule that John can only have
one date of birth, we could introduce the following soft rule (x _
y) where x and y are the Boolean variables associated with facts
bornOn(John; 1931 ) and bornOn(John; 1940 ), respectively.
However, it is not clear how to determine the weight W of such a soft
rule. If W is not large enough, then we could not enforce the hard
consistency constraint. On the other hand, if W is too large, then
any (1 + ) approximation algorithm (for &gt; 0) might assign true
to bornOn(John; 1931 ), as the ratio between such a solution and</p>
          <p>W
the optimum is W +w .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>ALGORITHM</title>
      <p>
        Our algorithm is inspired by Johnson’s approximation algorithm
for the original weighted MaxSAT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] problem (with no additional
consistency constraints). This is based on the method of conditional
probabilities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which allows to convert a randomized
approximation algorithm into a deterministic one, while preserving the
approximation guarantee.
      </p>
      <p>
        The first step is to compute a real value pi in [0; 1] for each fact
fi, satisfying the following property: the sum of all pi’s
corresponding to the facts within a same hard rule is at most 1. Each
pi is interpreted as the probability that fi is assigned true and is
computed in such a way that pi is large when the confidence in fact
fi being true is high (i.e., w(fi) is large) and small otherwise. A
simple algorithm for computing the pi’s proceeds as follows: for
each hard rule S, with probability 1=2 assign true to the fact with
largest weight in S and with probability 1=2 assign false to all facts
in S. This gives a 1=2-approximation algorithm, however there
are more sophisticated algorithms which work better in practice,
while maintaining the approximation guarantee (see our technical
report for more details [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]). Then at each step t, we consider a
hard rule St and we determine a truth assignment for all facts in St
which maximizes a carefully defined potential function. Our
potential function can be interpreted as the expected total weight of
satisfied clauses with each unassigned fact fi being assigned true
with probability pi (independently from the facts not in St).
      </p>
      <p>Formally, we denote with Wt the value of our potential function
at step t. At the beginning of our algorithm all facts are unassigned
and the value of our potential function (W0) is defined as
W0 = X w(C) P (C is satisfied):</p>
      <p>C2C
where the probability that a clause is satisfied is a function of the
pi’s computed at the beginning of our algorithm.</p>
      <p>At step t, let ^ft 1 = f^1; : : : ; f^t 1 be the truth assignment for
the facts ft 1 = f1; : : : ; ft 1 and let St be a hard rule considered
at step t. We denote with St = f alse the truth assignment that
assigns false to all facts in St. For every f in St, we define
Wt;f=true =</p>
      <p>X w(C) P (C is sat.j^ft 1 = ft 1; f = true)
where P(AjB) denotes a conditional probability. When all facts
in St are assigned false our potential function is denoted as
Wt;St=false =</p>
      <p>X w(C) P (C is sat.j^ft 1 = ft 1; St = f alse):
C2C</p>
      <p>C2C
Our algorithm determines the truth assignment that maximizes the
current potential function by choosing the largest value among all
Wt;f=true’s and Wt;St=false and then assigns the corresponding
truth values to the facts in St. At each iteration, all satisfied clauses
are removed from the set of clauses.</p>
      <p>
        Our algorithm stops when all facts have been assigned a truth
value. We remark that our algorithm is completely deterministic
(i.e., it always produces the same output given the same input).
Algorithm 1 shows pseudocode for our algorithm, while the
approximation guarantee of our algorithm is stated in Theorem 1.
Algorithm 1 Weighted MaxSAT with Soft and Hard Rules
1: For each hard rule compute a prob. distribution over its facts
so that for each fact fi, pi is proportional to w(fi), see [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
2: For each hard rule St 2 S:
3: Let f be the fact with largest Wt;f=true among all facts in St.
      </p>
      <p>If Wt;f=true Wt;St=false then assign true to f ,
else assign false to all facts in St.
4: Remove all satisfied clauses.</p>
      <sec id="sec-6-1">
        <title>THEOREM 1. Given a set C of Horn clauses and a set S of</title>
        <p>hard rules, let be the minimum number of negated literals in any</p>
        <sec id="sec-6-1-1">
          <title>Horn clause that has at least two literals. Our algorithm is a p</title>
          <p>approximation algorithm for the MaxSAT problem with soft and
hard rules, where p is obtained by solving the equation p = 1 p .</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>The running time of our algorithm is O(mn) in the worst case, with</title>
          <p>m = Pc2C jcj and n = Ps2S jsj.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>COROLLARY 1. Our algorithm has an approximation guaran</title>
          <p>tee of 1=2 for general Horn clauses.</p>
          <p>
            Due to space constraints, we defer the proof of Theorem 1 to
[
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]. We are not aware of any closed form formula to express the
solutions of the equation p = 1 p as a function of . In the case
= 2 we obtain p 0:618, while in the case = 3 we obtain
a ratio of roughly 0:68. We can also show an approximation
guarantee of 0:83 in some cases of interest (see [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]). Our algorithm
reaches its worst case running time when every fact occurs in every
grounded soft rule, i.e., when jCj = jF j; 8C 2 C. In practice, this
case is unlikely, and in fact our experiments confirm the efficiency
of our algorithm in real-world applications (see Figure 1a).
4.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>REASONING FRAMEWORK</title>
      <p>In the absence of any soft and hard rules, URDF conforms to
a standard (conjunctive) SPARQL engine where all facts in F are
assumed to be true. Our key observation for query answering in
combination with MaxSAT solving is that still often only a small
subset of facts in F —often several orders of magnitude less facts
than those contained in the entire knowledge base—are relevant
for answering a specific query and for finding the corresponding
truth assignments to the facts that are related to the query. For this
purpose, we define the dependency graph DQ F of a query Q
as follows.</p>
      <p>DEFINITION 1. Given a knowledge base KB = fF ; C; Sg and
a conjunctive query Q, then:</p>
      <sec id="sec-7-1">
        <title>All possible groundings fi 2 F of the query atoms qj 2 Q are facts in the dependency graph DQ of Q.</title>
      </sec>
      <sec id="sec-7-2">
        <title>If a grounded fact fn 2 F is in DQ, then all grounded facts f1; : : : ; fn 1 2 F of all grounded soft rules C 2 C, in which fn is the head, are also in DQ.</title>
      </sec>
      <sec id="sec-7-3">
        <title>If a grounded fact fi 2 F is in DQ, then all grounded facts f1; : : : ; fk 2 F of the grounded hard rule S 2 S, which are mutually exclusive to fi, are also in DQ.</title>
        <p>
          Definition 1 already yields a recursive algorithm to compute the
dependency graph, which is similar to SLD resolution [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] used in
Datalog and Prolog. In our case, SLD resolution is extended by
an additional grounding step for of hard rules, i.e., whenever we
ground a fact fi, we also iteratively ground all hard rules that are
related to it, using fi as a new subquery1. The URDF reasoning
steps are summarized in Algorithm 2.
1We employ a form of tabling (i.e., memoization) in order to cache
redundant subgoals. This table also serves to break potential
cyAlgorithm 2 URDF Reasoning Framework
Require: A knowledge base KB with base facts F , first-order soft
rules C and first-order hard rules S, a conjunctive query Q
1: Initialize the dependency graph DQ = ;.
2: Ground all literals qi 2 Q via SLD resolution and add their
intersection to DQ.
3: Let CQ, SQ denote the sets of soft and hard rules grounded for
answering Q.
4: Expand DQ by all facts f in grounded rules CQ and SQ.
5: Construct a CNF formula over grounded clauses CQ and
individual facts DQ F .
6: Solve the constrained weighted MaxSAT over the CNF and sets
        </p>
        <p>SQ (Algorithm 1).
7: return DQ with truth assignments to all facts f 2 DQ</p>
        <p>
          Given a set of first order rules, this form of deductive grounding
has a well-known polynomial runtime for non-recursive Datalog
programs, and for linear, recursive programs, respectively. It
however has an exponential complexity (in the number of facts) already
for Datalog programs with a single, non-linear, recursive rule [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
Line 3 denotes the rules that were grounded during this resolution
phase in order to construct a Boolean formula in conjunctive
normal form (CNF). These grounded rules are already available from
the previous SLD resolution and can be kept in a simple buffer of
the algorithm. The CNF construction in Line 5 itself is linear in the
size of the grounded rules CQ and SQ, because all grounded soft
rules are already in clause form, while the grounded hard rules can
be input into our MaxSAT solver directly as plain sets of mutually
exclusive facts. The next step in Line 6 requires the execution of
Algorithm 1 for the weighted MaxSAT problem (with both soft and
hard rules) described in Section 3.
        </p>
        <p>We remark that the above form of dependency graph
construction guarantees truth assignments to query answers that are
consistent with the truth assigments that would be found by the MaxSAT
solver for the entire sets of facts F , clauses C, and constraints S.
That is, the truth assignments to facts in the dependency graph
DQ F for any query Q after MaxSAT solving are equivalent to
the truth assignments that would be obtained for these facts when
running the MaxSAT solver over the entire set of facts F in the
knowledge base (modulo ties and possible MaxSAT approximation
errors).</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>EXPERIMENTS</title>
      <p>The following experiments were run on an AMD Opteron
QuadCore 2.6 GHz server with 16 GB RAM, using Oracle-11g as
storage back-end for the underlying knowledge base. Physical I/O’s
were cached (thus aiming to eliminate variances due to disk
operations) by running each query once and then taking the average
runtime over 5 immediately repeated runs. Memory consumption
was generally not a delimiting factor, with up to 501 MB overall
memory consumption for our URDF Java implementation
(including the high overhead of the Java VM) and less than 10 MB for the
Alchemy package (see Subsection 5.2), implemented in C++.
5.1</p>
    </sec>
    <sec id="sec-9">
      <title>YAGO Knowledge Base, Rules and Queries</title>
      <p>
        The semantic graph of YAGO [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] serves as basis for our
experiments. YAGO is a large common-sense knowledge base that
has been extracted automatically from Wikipedia articles. YAGO
contains more than 2 million entities and 19 million facts. The
cles in SLD resolution if the same rule is attempted to be grounded
repeatedly with the same bindings of variables to constants.
facts include a class hierarchy of 200,000 classes with about 100
distinct relation types. Moreover, we employ 16 (partially
recursive) hand-crafted soft deduction rules of common-sense reasoning
about people and their relationships, together with 10 queries of
different shapes. We enforce functional properties of the predicates
bornIn, bornOnDate, diedIn, diedOnDate and marriedTo as
consistency constraints. As weights for base facts, we employ the
confidence weights provided by YAGO, while the weight for a soft rule
is calculated as a conditional probability for the entire rule to hold
(including the head literal), given that the body of the rule holds
(when grounded over YAGO, see [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] for a detailed description
of rules and queries). Queries were chosen such that many query
predicates are defined via deduction rules, which led to a recursion
depth of up to 7 in our experiments.
5.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Markov Logic: MAP and MC-SAT</title>
      <p>
        Alchemy2 provides a series of algorithms for statistical relational
learning and probabilistic inference based on the Markov Logic
Networks [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. It implements various MCMC sampling techniques,
including MAP inference [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] (which is a memory-efficient
stochastic MaxSAT solver based on MaxWalkSAT) and MC-SAT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
MAP inference yields truth assignments to facts (which allows for
precision comparisons with URDF), whereas MC-SAT computes
probability distributions over the underlying Markov network (which
URDF does not do). Thus, we merely refer to MC-SAT for runtime
comparisons as a state-of-art technique for MCMC sampling. We
found grounding the above rules and queries over the entire YAGO
knowledge base in Alchemy under an open-world assumption not
to be feasible due to the nearly quadratic deflation of the resulting
MLN structure. Hence, we provide the facts and rules grounded by
URDF (via SLD resolution) directly as input to Alchemy, which
effectively leads to a closed-world grounding of rules in the
corresponding MLN structure. MLN running times (for MAP and
MC-SAT) thus mostly correspond to the time needed for inference
by Alchemy over this much smaller network structure (plus some
overhead for parsing the formulas and grounding the network in
closed-world mode).
5.2.1
      </p>
      <sec id="sec-10-1">
        <title>Basic Query Processing over YAGO.</title>
        <p>The first setting reports running times and result precision for the
URDF reasoner compared to MAP inference and MC-SAT over the
basic YAGO setting. As for approximation quality, we measure the
relative precision of the URDF MaxSAT solver compared to the
MAP baseline: if the MaxSAT weight computed by URDF
(MAXSAT-W ) is at least as large as the weight achieved by MAP
inference (MAP-W ), and none of the hard constraints are violated by
either approach, we conclude that we found an equally good or even
better solution. Grounding time (SLD) denotes the time needed to
ground the query atoms, soft and hard rules, and to expand the
dependency graph via a top-down SLD resolution. In the following,
#C and #S denote the number of grounded soft and hard rules
(including unit clauses), while jCj and jSj denote the number of
occurrences of facts in all grounded soft and hard rules,
respectively. The overall query response time (in ms) for URDF is the
sum of SLD grounding and MaxSAT solving. Table 1 shows that
the URDF MaxSAT solver achieves more than two orders of
magnitude runtime improvement over MAP and MC-SAT, at 90
percent precision compared to the MAP baseline for queries Q1–Q10.
That is, for 9 out of the 10 queries URDF finds the same MaxSAT
weight as MAP, while only for query Q7, the weight returned by
URDF is marginally worse. We also see that running MC-SAT is
generally more expensive than MAP inference. In this basic setting,
2http://alchemy.cs.washington.edu/</p>
        <p>SLD grounding clearly dominates query response times for URDF.
However, for all queries we achieve interactive response times with
an overall runtime of at most 3 seconds per query.
5.2.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Synthetic Rule Expansions.</title>
        <p>To systematically investigate the asymptotic behavior of our
algorithm for large dependency graphs and more complex rules, we
employ synthetic expansions over the basic YAGO setting. In the
first expansion setting (Figure 1)(a), we expand each grounded soft
rule by 10 distinct facts as noise per expansion step, for each of the
query results depicted in Table 1. For the noisy facts, we apply
uniform weights and weights following Gaussian distributions (with
2=1) around the original weights ( ) of the YAGO facts. We thus
simulate very complex CNF formulas with more than 105
occurrences of facts in soft rules. In the following plots, the grounding
time also includes the time for expanding the CNF formulas with
these noisy facts and thus is not constant for all runs. Figure 1(a)
confirms that the runtime of the MaxSAT algorithm is not affected
by the weighting schemes and remains equally efficient.</p>
        <p>In the second expansion setting (Figure 1)(b), we do not only
vary the number of facts per soft rule, but also the number of
grounded soft and hard rules by replicating rules with different
combinations of noisy facts. That is, for each original grounded fact, we
introduce 10 mutually exclusive facts as noise, and, for each soft
rule, we expand the CNF by introducing 10 randomly expanded soft
rules at each expansion step. Overlap among soft rules is achieved
by uniform-randomly drawing the noisy facts from a static pool of
1,000 synthetic facts for all expansions. We keep Gaussian weights
for the expanded facts. This setting yields very large CNF formulas
with jCj + jSj ranging from 2,312 to 2,321,488. More specifically,
we create constraints with up to 21,277 soft rules and 2,311,551
occurrences of facts in all soft rules, over an overall amount of only
9,937 distinct facts for the last expansion step. In this step, each
fact on average occurs in more than 232 rules, each with an
average rule length of 108 facts. URDF solves the CNF formulas for
this expansion in 15.7 seconds, while remaining at a higher
precision in computing the MaxSAT weight as MAP. We measure the
approximation precision only for the first three expansion steps,
yielding MaxSAT weights of 853:74, 975:02, 1; 099:34 for URDF
and 854:58, 937:74, 1; 069:70 for MAP, respectively. MAP does
not scale to larger CNFs than in these first three expansions, while
MC-SAT cannot be run beyond the first expansion step anymore.
5.2.3</p>
      </sec>
      <sec id="sec-10-3">
        <title>Inductively Learned Rules.</title>
        <p>
          The previous experiments were run over the entire YAGO
knowledge base, using 16 handcrafted rules in a fully recursive fashion.
To measure the cost of deduction over such a large knowledge base
with more general rules, we conduct runs over 42 recursive rules
learned inductively by a variant of FOIL [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], using YAGO as
background knowledge. Figure 2(a) depicts the runtimes (in seconds)
for grounding queries Q1–Q10 over YAGO using these rules,
however breaking SLD resolution at different deduction levels.
(a)
(b)
        </p>
        <p>
          Reasoning with inconsistency and uncertainty has been gaining
significant attention in the Database, Semantic Web, and Machine
Learning communities lately. In contrast to related works on
uncertain and probabilistic databases, we consider a more dynamic
way of querying and reasoning with deduction rules for intensional
relations. All database approaches for uncertain data we are aware
of— for example Trio [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], MayBMS [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and MystiQ [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]—limit
queries to materialized data. Some systems support views [
          <xref ref-type="bibr" rid="ref2 ref29">2, 29</xref>
          ],
but only in materialized form which is equivalent to comprehensive
and thus expensive bottom-up grounding in Datalog. Moreover,
we also found recent approaches that adopt inference methods for
probabilistic graphical networks to database systems, such as [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
or [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], to be primarily designed for batch processing and not to
be well suited for interactive querying. A similar observation also
holds for Probabilistic Logic Programming (PLP) and Answer Set
Programming (ASP). PLP combines logic programming
(including Datalog) with probabilistic reasoning. ProbLog [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] employs
SLD resolution [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for grounding first-order rules, together with
Monte Carlo sampling or binary decision diagrams for
probabilistic inference over Boolean formulas obtained from SLD proofs.
ASP, on the other hand, is a more generic paradigm for solving
combinatorial problems with logic programs. In [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], the authors
study tractable subclasses of ASP with cardinality constraints and
weights. In Probabilistic Description Logic [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] (PDL) (see [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
for an overview), probability ranges can be attached to subsumption
(or instance-of) statements. PDL generalizes the description logic
SHOIN (D) and can thus express functional rules. However, PDL
cannot deal with truly inconsistent input statements. Thus, in order
to apply PDL to a knowledge base with noisy confidence scores in
place of the probabilities, the probability ranges would have to be
reconciled upfront—which amounts to solving the inconsistencies
before starting the reasoner.
        </p>
        <p>
          Statistical Relational Learning (SRL) has been gaining a strong
momentum in both the Database and Machine Learning
communities, with Markov Logic Networks (MLNs) [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] probably being
one of the most generic approaches for combining first-order logic
and probabilistic graphical models. In these classes of graphical
models, sampling methods based on Markov Chain Monte Carlo
3http://jena.sourceforge.net/
(MCMC) provide a family of efficient approximation algorithms
for probabilistic inference. Specifically, the algorithms employed
in Markov Logic for maximum-a-posteriori (MAP) inference [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]
and MC-SAT [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] constitute two state-of-the-art extensions to Max
WalkSAT-based, stochastic MaxSAT solvers [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and Gibbs-style
sampling techniques, respectively. Our approach diverges from
Markov Logic in two basic aspects: grounding and inference.
Grounding a first-order Markov network in an open-world semantics works
by binding all entities (constants) to variables in the first-order rules
that match the type signature of the respective predicates. For
binary predicates, this may result in grounded networks which are
often nearly quadratic in the number of entities in the knowledge
base. Unlike Markov Logic, we specifically focus on query-time
reasoning, with a deductive (closed-world) grounding of soft rules.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], we presented a “self organizing framework for
information extraction”, where the main problem has been formulated
also as a MaxSAT problem. The main difference between this work
and [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] is that here we deal with hard and soft rules in a more
principled way, while in [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] hard rules are enforced by introducing soft
rules with large weights (see Section 2 for a discussion about why
a more principled approach is needed). Moreover, we present an
algorithm with an approximation guarantee of 0:83 in some cases
of interest. The MaxSAT problem is well studied in the theoretical
computer science community (see for example [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). Here, we focus
on the effectiveness of our algorithms in solving real-life problems.
In [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], we presented an initial demo of our interactive reasoning
framework.
        </p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS</title>
      <p>We presented a query-time reasoning approach for uncertain RDF
data and SPARQL queries over a combination of soft deduction
rules and hard consistency constraints. URDF employs a
generalized weighted MaxSAT algorithm that guarantees consistent query
answers with regard to the hard constraints. Our experiments
confirm that our MaxSAT approximation algorithm yields interactive
response times over formulas with many thousands of grounded
rules and facts, while empirically performing much better than the
provided lower bound of 1/2 for the approximation guarantee. We
also saw that, in many cases, the grounding time for the
Datalogstyle soft deduction rules is the actual delimiting factor for query
response times. Our future work will investigate how to further
scale up inference by a combination of dynamic grounding over the
highly transient parts of the knowledge base with materialized facts
for the more static parts, as well as by distributing our grounding
and MaxSAT-based inference strategies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Alon</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Spencer</surname>
          </string-name>
          .
          <article-title>The Probabilistic Method</article-title>
          . John Wiley,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Antova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Olteanu</surname>
          </string-name>
          . MayBMS:
          <article-title>Managing incomplete information with probabilistic world-set decompositions</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Apt</surname>
          </string-name>
          and
          <string-name>
            <surname>M. H. van Emden</surname>
          </string-name>
          .
          <article-title>Contributions to the theory of logic programming</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <fpage>841</fpage>
          -
          <lpage>862</lpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. G.</given-names>
            <surname>Ives. DBpedia:</surname>
          </string-name>
          <article-title>a nucleus for a web of open data</article-title>
          .
          <source>In ISWC</source>
          , pages
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Boulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mandhani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mathur</surname>
          </string-name>
          , C. Re´, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>MYSTIQ: a system for finding more answers by using probabilities</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>891</fpage>
          -
          <lpage>893</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Betteridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. R. H.</given-names>
            <surname>Jr.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>Coupled semi-supervised learning for information extraction</article-title>
          .
          <source>In WSDM</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-A.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizzolo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Learning and inference with constraints</article-title>
          .
          <source>In AAAI</source>
          , pages
          <fpage>1513</fpage>
          -
          <lpage>1518</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. X.</given-names>
            <surname>Goemans</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Williamson</surname>
          </string-name>
          .
          <article-title>New 3/4-approximation algorithms for the maximum satisfiability problem</article-title>
          .
          <source>SIAM J. Discr. Math.</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <fpage>656</fpage>
          -
          <lpage>666</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          .
          <article-title>On the complexity of single-rule Datalog queries</article-title>
          .
          <source>Inf. Comput.</source>
          ,
          <volume>183</volume>
          :
          <fpage>104</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Heflin.</surname>
          </string-name>
          <article-title>LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <fpage>158</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Jaumard</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Simeone</surname>
          </string-name>
          .
          <article-title>On the complexity of the maximum satisfiability problem for Horn formulas</article-title>
          .
          <source>Information Processing Letters</source>
          ,
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <article-title>Approximation algorithms for combinatorial problems</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          , pages
          <fpage>256</fpage>
          -
          <lpage>278</lpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kautz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Selman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          .
          <article-title>A general stochastic approach to solving problems with hard and soft constraints</article-title>
          .
          <source>In The Satisfiability Problem: Theory and Applications</source>
          , pages
          <fpage>573</fpage>
          -
          <lpage>586</lpage>
          . American Mathematical Society,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>Probabilistic description logic programs</article-title>
          .
          <source>Int. J. Approx. Reasoning</source>
          ,
          <volume>45</volume>
          (
          <issue>2</issue>
          ):
          <fpage>288</fpage>
          -
          <lpage>307</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Straccia</surname>
          </string-name>
          .
          <article-title>Managing uncertainty and vagueness in description logics for the semantic web</article-title>
          .
          <source>J. of Web Semantics</source>
          , (
          <volume>6</volume>
          ):
          <fpage>291</fpage>
          -
          <lpage>308</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schultz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          . FACTORIE:
          <article-title>Probabilistic programming via imperatively defined factor graphs</article-title>
          .
          <source>In NPIS</source>
          , pages
          <fpage>1249</fpage>
          -
          <lpage>1257</lpage>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Meiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dylla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          .
          <article-title>Interactive reasoning in uncertain RDF knowledge bases</article-title>
          .
          <source>In CIKM</source>
          , pages
          <fpage>2557</fpage>
          -
          <lpage>2560</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Niu</surname>
          </string-name>
          , C. Re´,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          . Tuffy:
          <article-title>Scaling up statistical inference in Markov Logic Networks using an RDBMS</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>6</issue>
          ):
          <fpage>373</fpage>
          -
          <lpage>384</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          , S. Ru¨mmele, S. Szeider, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Woltran</surname>
          </string-name>
          .
          <article-title>Tractable answer-set programming with weight constraints: Bounded treewidth is not enough</article-title>
          .
          <source>In KR</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sumner</surname>
          </string-name>
          .
          <article-title>A general method for reducing the complexity of relational inference and its application to MCMC</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>AAAI</given-names>
          </string-name>
          , pages
          <fpage>1075</fpage>
          -
          <lpage>1080</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <article-title>Learning logical definitions from relations</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>5</volume>
          :
          <fpage>239</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen. ProbLog</surname>
          </string-name>
          :
          <article-title>A probabilistic Prolog and its application in link discovery</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>2462</fpage>
          -
          <lpage>2467</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Markov Logic Networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>62</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Singla</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Memory-efficient inference in relational domains</article-title>
          .
          <source>In AAAI</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum. Yago</surname>
          </string-name>
          :
          <article-title>A core of semantic knowledge</article-title>
          .
          <source>In WWW</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sozio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum. SOFIE</surname>
          </string-name>
          :
          <article-title>A self-organizing framework for information extraction</article-title>
          .
          <source>In WWW</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sozio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , and
          <string-name>
            <surname>N. Nakashole. URDF:</surname>
          </string-name>
          <article-title>An efficient reasoning framework for uncertain knowledge bases with hard and soft rules</article-title>
          .
          <source>Technical report, Max-Planck-Institute for Informatics</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>M. L. Wick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>McCallum</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Miklau</surname>
          </string-name>
          .
          <article-title>Scalable probabilistic databases with factor graphs and MCMC</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>794</fpage>
          -
          <lpage>804</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Trio: A system for integrated management of data, accuracy, and lineage</article-title>
          .
          <source>In CIDR</source>
          , pages
          <fpage>262</fpage>
          -
          <lpage>276</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>