<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Query-Time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ndapandula</forename><surname>Nakashole</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Mauro</forename><surname>Sozio</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Fabian</forename><surname>Suchanek</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Martin</forename><surname>Theobald</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Max-Planck Institute for Informatics</orgName>
								<address>
									<settlement>Saarbr ücken</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Institut Mines-Telecom</orgName>
								<orgName type="institution">Telecom ParisTech</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">CNRS</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">Max-Planck Institute for Informatics</orgName>
								<address>
									<settlement>Saarbr ücken</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">Max-Planck Institute for Informatics</orgName>
								<address>
									<settlement>Saarbr ücken</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Query-Time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">28BF54311489646548346ED48E65A24C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T06:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Uncertain RDF</term>
					<term>Soft and Hard Rules</term>
					<term>Deductive Grounding</term>
					<term>MaxSAT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Recent advances in information extraction have paved the way for the automatic construction and growth of large, semantic knowledge bases from Web sources. However, the very nature of these extraction techniques entails that the resulting RDF knowledge bases may face a significant amount of incorrect, incomplete, or even inconsistent (i.e., uncertain) factual knowledge, which makes efficient query answering over this kind of uncertain RDF data a challenge. Our engine, coined URDF, augments first-order reasoning by a combination of soft rules (Datalog-style implications), which are grounded in a deductive fashion in order to derive new facts from existing ones, and hard rules (mutual-exclusiveness constraints), which enforce additional consistency constraints among both base and derived facts. At the core of our approach is an efficient approximation algorithm for this constrained form of the weighted MaxSAT problem with soft and hard rules, allowing us to dynamically resolve inconsistencies directly at query-time. Experiments on real-world and synthetic data confirm a high robustness and significantly improved runtime of our framework in comparison to state-of-the-art MCMC techniques.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>The recent advent of information extraction techniques has enabled the automatic construction and growth of large, semantic knowledge bases from Web sources. Knowledge bases such as DBpedia <ref type="bibr" target="#b3">[4]</ref>, YAGO <ref type="bibr" target="#b24">[25]</ref>, Freebase.com, and TrueKnowledge.com consist of many millions or even billions of facts, which are usually captured in the form of RDF-style subject-predicate-object (SPO) triples. Moreover, the Linked-Data initiative (LinkedData.org) encompasses these and many other RDF datasets, along with extensive cross-linkage in the form of owl:sameAs properties between entities in different data sources. For high coverage of entities and their properties, it is natural to use automated, often heuristic or probabilistic, methods to populate these knowledge bases, and, VLDS <ref type="bibr">'12, August 31, 2012</ref>. Istanbul, Turkey. Copyright c 2012 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors perhaps, also to automatically establish owl:sameAs links. Consequently, these data sources may contain a significant fraction of noise and incorrect triples. However, even if we accept such data errors and inconsistencies, no knowledge base can ever be complete, and even the entire Linked-Data cloud can hardly ever cover all interesting properties of relevant entities.</p><p>Research on knowledge base construction has adopted probabilistic models and statistical learning for cleaning the gathered pool of raw fact candidates (typically extracted from Wikipedia and textual or semi-structured Web pages). A powerful instrument to this end is reasoning with consistency constraints (see, e.g., <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b25">26]</ref>). For example, specifying that a person can have only one spouse (at a given snapshot in time) helps distinguishing marriages from romantic affairs, and removing false hypotheses for isMarriedTo triples. On the other hand, keeping only those triples with the highest confidence (or likelihood) of being correct, is an unduly eager and conservative approach. For example, when searching for musician couples where both partners have won a Grammy award, correct answers, such as John Lennon and Yoko Ono, may well be composed out of low-confidence input triples, as the join predicates in the query impose additional restrictions and could implicitly "de-noise" the answers. For example, the Lennon/Ono marriage is not in any of the Wikipedia infoboxes, and respective extractions from free text should have lower confidence.</p><p>When query answers are empty because critical pieces of knowledge are missing, reasoning over uncertain facts can be helpful to produce-at least-likely or speculative results. To this end, deduction rules are an instrument to infer answers that go beyond the extensionally represented content of the knowledge base. These rules themselves may be uncertain as well. For example, suppose we want to find the doctoral advisor of Alon Y. Halevy. Often (but not always) the senior author on the first few papers of a researcher is the doctoral advisor. Based on such a rule, we could deduce that Yehoshua Sagiv is Halevy's advisor. Moreover, deduction rules cover a wide range of RDF/S and OWL-based reasoning concepts, such as the owl:TransitiveProperty of predicates (e.g., for the rdfs:subClassOf property or over owl:sameAs links), which lie at the expressive intersection of Datalog-style deductive reasoning and OWL-DL. Additional consistency constraints, on the other hand, cover OWL concepts such as the owl:FunctionalProperty or owl:disjointWith properties of predicates and classes.</p><p>In summary, we emphasize the following desiderata for querytime reasoning over uncertain RDF triples: 1) to give answers to complex SPARQL queries over triples with highly varying confidence values; 2) to overcome the incompleteness problem, exploit deduction rules, which may themselves be uncertain, and to infer answers even if some critical triples are missing; 3) to counter amplified noise and keep query-result precision high, take into account consistency constraints in the specific context of a user query; 4) achieve all of the above with high efficiency, so that queries can be answered with interactive response time of a few seconds. Our aim with URDF is to address the above desiderata in an integrated manner. Our implementation is based on top-down, ondemand grounding of rules expressed in first-order logic, together with a constrained form of weighted MaxSAT solving. Considering hard constraints jointly with MaxSAT reasoning over propositional clauses poses additional challenges; to our knowledge, these have not been addressed in prior work for interactive, query-time reasoning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">DEFINITIONS AND NOTATIONS</head><p>We are given a set X of Boolean variables, each variable taking either the value true or false. The negation of a variable x ∈ X (denoted as x), has the value true if and only if x is assigned false. We shall refer to a variable x and its negation x as a literal. A Horn clause C is a set of literals containing at most one positive literal. Given a truth assignment to variables, we shall say that a Horn clause is satisfied if it contains at least one literal whose value is true (clauses are assumed to be in disjunctive form). Every clause C is associated with a positive weight w(C) ∈ R. A Horn formula is defined as a conjunction of Horn clauses (and hence Horn formulas are in conjunctive normal form, CNF). A Horn formula is satisfiable if there is a truth assignment to all literals such that all clauses are satisfied. As we deal with Horn formulas that might not be satisfiable, we seek to find a truth assignment that maximizes the total weight of satisfied clauses. An example of a Horn formula is the following</p><formula xml:id="formula_0">(x1 ∨ x2) ∧ (x2 ∨ x3) equiv. to (x1 ← x2) ∧ (x3 ← x2),</formula><p>where ← denotes logical implication.</p><p>Given a set of relation types R and a set E of entities, a fact f is defined as a triplet of the form f = (e1, e2, r), which expresses an instance of a binary relation of type r ∈ R for two entities e1, e2 ∈ E (i.e., we could denote the fact that "John is married to Yoko", where John and Yoko are both entities). Moreover, facts may be uncertain. Hence every fact f is also associated with a positive weight w(f ) ∈ R, expressing the degree of confidence in the fact being correct. Moreover, every fact is also associated with a Boolean variable x f ∈ X, whose value indicates whether the corresponding fact is true or false. In the following, we shall simplify the notation and do not distinguish between variables and facts anymore. Hence, assigning the value true to a fact f corresponds to assigning true to the corresponding variable x f . We define a knowledge base KB as a triplet KB = {F, C, S}, where F is a set of facts, C is a set of Horn clauses, and S is a collection of disjoint sets of facts. We shall refer to C and S as soft deduction rules and hard consistency constraints, respectively (or soft and hard rules for short).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Soft Deduction Rules</head><p>We consider weighted Horn clauses with exactly one positive head literal as soft rules. A soft rule could state, for example, that "if Yoko and John are married and John lives in NY, then also Yoko lives in NY", with a confidence of 0.38 of being correct, which we write as follows:</p><formula xml:id="formula_1">lives(Yoko, NY ) ← married(Yoko, John) ∧ lives(John, NY ) [0.38]</formula><p>To tackle the inherent incompleteness of F, we lift soft rules into first-order rules with universally quantified variables, which serve as input to our knowledge base. Soft rules then have the shape of first-order Horn clauses and can be written as Datalog-style implications. To express the first-order rule that "married couples usually live in the same place", for example, we use the following compact notation:</p><formula xml:id="formula_2">livesIn(p1 , loc1 ) ← marriedTo(p1 , p2 ) ∧ livesIn(p2 , loc1 ) [0.38]</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Hard Consistency Constraints</head><p>The set of facts F may contain inconsistent information. Hence, we introduce hard consistency constraints that take the shape of a collection of disjoint sets of mutually-exclusive facts S = S1, . . . , S |S| . We enforce the constraint that for each hard rule S k , at most one fact f ∈ S k may be assigned the value true. For example, if we observe two or more different birth dates for a person, clearly something went wrong either during extraction or when reasoning with soft rules. We can formally identify such an inconsistency by formulating a consistency constraint as follows:</p><formula xml:id="formula_3">(date1 = date2 ) ← bornOn(p1, date1) ∧ bornOn(p1, date2)</formula><p>Grounding this hard rules could, for example, then yields the following set of mutually exclusive facts {bornOn(John, 1931 ), bornOn(John, 1940 ), bornOn(John, 1957 )} which specifies that John could be born either in <ref type="bibr">1931,</ref><ref type="bibr">1940,</ref><ref type="bibr">1957</ref>, or in none of the above years. In contrast to soft rules, hard rules may not be violated by any truth assignment to the corresponding Boolean variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Problem Definition</head><p>As we deal with Horn formulas that might not be satisfiable, we seek to find a truth assignment that maximizes the total weight of the satisfied clauses. We call this problem weighted MaxSAT with soft and hard rules which we formally define as follows. We are given a set of facts F = {f1, f2, . . . , fn}, a collection C = C1, . . . , Cm of clauses in Horn form (soft rules), and a collection of sets S = S1, . . . , St that partition F (hard rules). Each clause C ∈ C is associated with a positive weight w(C). We wish to find a truth assignment to each variable f ∈ F such that for each set in S at most one fact is assigned the value true, and the sum of the weights of the satisfied clauses is maximized.</p><p>Given a knowledge base KB = {F, C, S} (in grounded form), we define an instance of the MaxSAT problem with soft and hard rules as follows. Every fact fi ∈ F is associated with a Boolean variable. In addition, we introduce a unit clause Ci = {fi}, whose weight is equal to the confidence of the corresponding fact. For convenience of notation, for each fact fi, we also introduce a unit hard rule Si = {fi} into S. As the MaxSAT problem with hard and soft rules is a generalization of the classic MaxSAT problem with Horn clauses, which is NP-Hard <ref type="bibr" target="#b10">[11]</ref>, it follows that also our problem is NP-Hard. Because of the intractability to compute an optimum solution for the above problem, we resort to devise an approximation algorithm.</p><p>We remark that instead of hard rules, one could also enforce consistency constraints by introducing soft rules with high weights. However, in combination with an approximation algorithm, this approach may involve non-trivial issues as illustrated by the following example. Consider the following two facts bornOn(John, 1931 ) and bornOn(John, 1940 ), whose weights are 0 and w 0, respectively. In order to enforce the hard rule that John can only have one date of birth, we could introduce the following soft rule (x ∨ ȳ) where x and y are the Boolean variables associated with facts bornOn(John, 1931 ) and bornOn(John, 1940 ), respectively. However, it is not clear how to determine the weight W of such a soft rule. If W is not large enough, then we could not enforce the hard consistency constraint. On the other hand, if W is too large, then any (1+ )−approximation algorithm (for &gt; 0) might assign true to bornOn <ref type="bibr">(John, 1931 )</ref>, as the ratio between such a solution and the optimum is W W +w .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">ALGORITHM</head><p>Our algorithm is inspired by Johnson's approximation algorithm for the original weighted MaxSAT <ref type="bibr" target="#b11">[12]</ref> problem (with no additional consistency constraints). This is based on the method of conditional probabilities <ref type="bibr" target="#b0">[1]</ref> which allows to convert a randomized approximation algorithm into a deterministic one, while preserving the approximation guarantee.</p><p>The first step is to compute a real value pi in [0, 1] for each fact fi, satisfying the following property: the sum of all pi's corresponding to the facts within a same hard rule is at most 1. Each pi is interpreted as the probability that fi is assigned true and is computed in such a way that pi is large when the confidence in fact fi being true is high (i.e., w(fi) is large) and small otherwise. A simple algorithm for computing the pi's proceeds as follows: for each hard rule S, with probability 1/2 assign true to the fact with largest weight in S and with probability 1/2 assign false to all facts in S. This gives a 1/2-approximation algorithm, however there are more sophisticated algorithms which work better in practice, while maintaining the approximation guarantee (see our technical report for more details <ref type="bibr" target="#b26">[27]</ref>). Then at each step t, we consider a hard rule St and we determine a truth assignment for all facts in St which maximizes a carefully defined potential function. Our potential function can be interpreted as the expected total weight of satisfied clauses with each unassigned fact fi being assigned true with probability pi (independently from the facts not in St).</p><p>Formally, we denote with Wt the value of our potential function at step t. At the beginning of our algorithm all facts are unassigned and the value of our potential function (W0) is defined as</p><formula xml:id="formula_4">W0 = C∈C w(C) • P(C is satisfied).</formula><p>where the probability that a clause is satisfied is a function of the pi's computed at the beginning of our algorithm.</p><p>At step t, let ft−1 = f1, . . . , ft−1 be the truth assignment for the facts ft−1 = f1, . . . , ft−1 and let St be a hard rule considered at step t. We denote with St = f alse the truth assignment that assigns false to all facts in St. For every f in St, we define</p><formula xml:id="formula_5">W t,f =true = C∈C w(C) • P(C is sat.| ft−1 = ft−1, f = true)</formula><p>where P(A|B) denotes a conditional probability. When all facts in St are assigned false our potential function is denoted as</p><formula xml:id="formula_6">W t,S t =f alse = C∈C w(C) • P(C is sat.| ft−1 = ft−1, St = f alse).</formula><p>Our algorithm determines the truth assignment that maximizes the current potential function by choosing the largest value among all W t,f =true 's and W t,S t =f alse and then assigns the corresponding truth values to the facts in St. At each iteration, all satisfied clauses are removed from the set of clauses.</p><p>Our algorithm stops when all facts have been assigned a truth value. We remark that our algorithm is completely deterministic (i.e., it always produces the same output given the same input). Algorithm 1 shows pseudocode for our algorithm, while the approximation guarantee of our algorithm is stated in Theorem 1.</p><p>Algorithm 1 Weighted MaxSAT with Soft and Hard Rules 1: For each hard rule compute a prob. distribution over its facts so that for each fact fi, pi is proportional to w(fi), see <ref type="bibr" target="#b26">[27]</ref>. 2: For each hard rule St ∈ S: 3: • Let f be the fact with largest W t,f=true among all facts in St.</p><p>• • If W t,f =true ≥ W t,S t =false then assign true to f , else assign false to all facts in St. Due to space constraints, we defer the proof of Theorem 1 to <ref type="bibr" target="#b26">[27]</ref>. We are not aware of any closed form formula to express the solutions of the equation p = 1 − p λ as a function of λ. In the case λ = 2 we obtain p ≈ 0.618, while in the case λ = 3 we obtain a ratio of roughly 0.68. We can also show an approximation guarantee of 0.83 in some cases of interest (see <ref type="bibr" target="#b26">[27]</ref>). Our algorithm reaches its worst case running time when every fact occurs in every grounded soft rule, i.e., when |C| = |F|, ∀C ∈ C. In practice, this case is unlikely, and in fact our experiments confirm the efficiency of our algorithm in real-world applications (see Figure <ref type="figure" target="#fig_2">1a</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">REASONING FRAMEWORK</head><p>In the absence of any soft and hard rules, URDF conforms to a standard (conjunctive) SPARQL engine where all facts in F are assumed to be true. Our key observation for query answering in combination with MaxSAT solving is that still often only a small subset of facts in F-often several orders of magnitude less facts than those contained in the entire knowledge base-are relevant for answering a specific query and for finding the corresponding truth assignments to the facts that are related to the query. For this purpose, we define the dependency graph DQ ⊆ F of a query Q as follows. Definition 1 already yields a recursive algorithm to compute the dependency graph, which is similar to SLD resolution <ref type="bibr" target="#b2">[3]</ref> used in Datalog and Prolog. In our case, SLD resolution is extended by an additional grounding step for of hard rules, i.e., whenever we ground a fact fi, we also iteratively ground all hard rules that are related to it, using fi as a new subquery 1 . The URDF reasoning steps are summarized in Algorithm 2. 1 We employ a form of tabling (i.e., memoization) in order to cache redundant subgoals. This table also serves to break potential cy-Algorithm 2 URDF Reasoning Framework Require: A knowledge base KB with base facts F, first-order soft rules C and first-order hard rules S, a conjunctive query Q 1: Initialize the dependency graph DQ = ∅. 2: Ground all literals qi ∈ Q via SLD resolution and add their intersection to DQ. 3: Let CQ, SQ denote the sets of soft and hard rules grounded for answering Q. 4: Expand DQ by all facts f in grounded rules CQ and SQ. 5: Construct a CNF formula over grounded clauses CQ and individual facts DQ ⊆ F . 6: Solve the constrained weighted MaxSAT over the CNF and sets SQ (Algorithm 1). 7: return DQ with truth assignments to all facts f ∈ DQ Given a set of first order rules, this form of deductive grounding has a well-known polynomial runtime for non-recursive Datalog programs, and for linear, recursive programs, respectively. It however has an exponential complexity (in the number of facts) already for Datalog programs with a single, non-linear, recursive rule <ref type="bibr" target="#b8">[9]</ref>. Line 3 denotes the rules that were grounded during this resolution phase in order to construct a Boolean formula in conjunctive normal form (CNF). These grounded rules are already available from the previous SLD resolution and can be kept in a simple buffer of the algorithm. The CNF construction in Line 5 itself is linear in the size of the grounded rules CQ and SQ, because all grounded soft rules are already in clause form, while the grounded hard rules can be input into our MaxSAT solver directly as plain sets of mutually exclusive facts. The next step in Line 6 requires the execution of Algorithm 1 for the weighted MaxSAT problem (with both soft and hard rules) described in Section 3.</p><p>We remark that the above form of dependency graph construction guarantees truth assignments to query answers that are consistent with the truth assigments that would be found by the MaxSAT solver for the entire sets of facts F, clauses C, and constraints S. That is, the truth assignments to facts in the dependency graph DQ ⊆ F for any query Q after MaxSAT solving are equivalent to the truth assignments that would be obtained for these facts when running the MaxSAT solver over the entire set of facts F in the knowledge base (modulo ties and possible MaxSAT approximation errors).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">EXPERIMENTS</head><p>The following experiments were run on an AMD Opteron Quad-Core 2.6 GHz server with 16 GB RAM, using Oracle-11g as storage back-end for the underlying knowledge base. Physical I/O's were cached (thus aiming to eliminate variances due to disk operations) by running each query once and then taking the average runtime over 5 immediately repeated runs. Memory consumption was generally not a delimiting factor, with up to 501 MB overall memory consumption for our URDF Java implementation (including the high overhead of the Java VM) and less than 10 MB for the Alchemy package (see Subsection 5.2), implemented in C++.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">YAGO Knowledge Base, Rules and Queries</head><p>The semantic graph of YAGO <ref type="bibr" target="#b24">[25]</ref> serves as basis for our experiments. YAGO is a large common-sense knowledge base that has been extracted automatically from Wikipedia articles. YAGO contains more than 2 million entities and 19 million facts. The cles in SLD resolution if the same rule is attempted to be grounded repeatedly with the same bindings of variables to constants. facts include a class hierarchy of 200,000 classes with about 100 distinct relation types. Moreover, we employ 16 (partially recursive) hand-crafted soft deduction rules of common-sense reasoning about people and their relationships, together with 10 queries of different shapes. We enforce functional properties of the predicates bornIn, bornOnDate, diedIn, diedOnDate and marriedTo as consistency constraints. As weights for base facts, we employ the confidence weights provided by YAGO, while the weight for a soft rule is calculated as a conditional probability for the entire rule to hold (including the head literal), given that the body of the rule holds (when grounded over YAGO, see <ref type="bibr" target="#b26">[27]</ref> for a detailed description of rules and queries). Queries were chosen such that many query predicates are defined via deduction rules, which led to a recursion depth of up to 7 in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Markov Logic: MAP and MC-SAT</head><p>Alchemy 2 provides a series of algorithms for statistical relational learning and probabilistic inference based on the Markov Logic Networks <ref type="bibr" target="#b22">[23]</ref>. It implements various MCMC sampling techniques, including MAP inference <ref type="bibr" target="#b23">[24]</ref> (which is a memory-efficient stochastic MaxSAT solver based on MaxWalkSAT) and MC-SAT <ref type="bibr" target="#b19">[20]</ref>. MAP inference yields truth assignments to facts (which allows for precision comparisons with URDF), whereas MC-SAT computes probability distributions over the underlying Markov network (which URDF does not do). Thus, we merely refer to MC-SAT for runtime comparisons as a state-of-art technique for MCMC sampling. We found grounding the above rules and queries over the entire YAGO knowledge base in Alchemy under an open-world assumption not to be feasible due to the nearly quadratic deflation of the resulting MLN structure. Hence, we provide the facts and rules grounded by URDF (via SLD resolution) directly as input to Alchemy, which effectively leads to a closed-world grounding of rules in the corresponding MLN structure. MLN running times (for MAP and MC-SAT) thus mostly correspond to the time needed for inference by Alchemy over this much smaller network structure (plus some overhead for parsing the formulas and grounding the network in closed-world mode).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Basic Query Processing over YAGO.</head><p>The first setting reports running times and result precision for the URDF reasoner compared to MAP inference and MC-SAT over the basic YAGO setting. As for approximation quality, we measure the relative precision of the URDF MaxSAT solver compared to the MAP baseline: if the MaxSAT weight computed by URDF (MAX-SAT-W) is at least as large as the weight achieved by MAP inference (MAP-W), and none of the hard constraints are violated by either approach, we conclude that we found an equally good or even better solution. Grounding time (SLD) denotes the time needed to ground the query atoms, soft and hard rules, and to expand the dependency graph via a top-down SLD resolution. In the following, #C and #S denote the number of grounded soft and hard rules (including unit clauses), while |C| and |S| denote the number of occurrences of facts in all grounded soft and hard rules, respectively. The overall query response time (in ms) for URDF is the sum of SLD grounding and MaxSAT solving. Table <ref type="table" target="#tab_0">1</ref> shows that the URDF MaxSAT solver achieves more than two orders of magnitude runtime improvement over MAP and MC-SAT, at 90 percent precision compared to the MAP baseline for queries Q1-Q10. That is, for 9 out of the 10 queries URDF finds the same MaxSAT weight as MAP, while only for query Q7, the weight returned by URDF is marginally worse. We also see that running MC-SAT is generally more expensive than MAP inference. In this basic setting, SLD grounding clearly dominates query response times for URDF.</p><p>However, for all queries we achieve interactive response times with an overall runtime of at most 3 seconds per query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">Synthetic Rule Expansions.</head><p>To systematically investigate the asymptotic behavior of our algorithm for large dependency graphs and more complex rules, we employ synthetic expansions over the basic YAGO setting. In the first expansion setting (Figure <ref type="figure" target="#fig_1">1</ref>)(a), we expand each grounded soft rule by 10 distinct facts as noise per expansion step, for each of the query results depicted in Table <ref type="table" target="#tab_0">1</ref>. For the noisy facts, we apply uniform weights and weights following Gaussian distributions (with σ 2 =1) around the original weights (µ) of the YAGO facts. We thus simulate very complex CNF formulas with more than 10 5 occurrences of facts in soft rules. In the following plots, the grounding time also includes the time for expanding the CNF formulas with these noisy facts and thus is not constant for all runs. Figure <ref type="figure" target="#fig_1">1</ref>(a) confirms that the runtime of the MaxSAT algorithm is not affected by the weighting schemes and remains equally efficient.</p><p>In the second expansion setting (Figure <ref type="figure" target="#fig_1">1</ref>)(b), we do not only vary the number of facts per soft rule, but also the number of grounded soft and hard rules by replicating rules with different combinations of noisy facts. That is, for each original grounded fact, we introduce 10 mutually exclusive facts as noise, and, for each soft rule, we expand the CNF by introducing 10 randomly expanded soft rules at each expansion step. Overlap among soft rules is achieved by uniform-randomly drawing the noisy facts from a static pool of 1,000 synthetic facts for all expansions. We keep Gaussian weights for the expanded facts. This setting yields very large CNF formulas with |C| + |S| ranging from 2,312 to 2,321,488. More specifically, we create constraints with up to 21,277 soft rules and 2,311,551 occurrences of facts in all soft rules, over an overall amount of only 9,937 distinct facts for the last expansion step. In this step, each fact on average occurs in more than 232 rules, each with an average rule length of 108 facts. URDF solves the CNF formulas for this expansion in 15.7 seconds, while remaining at a higher precision in computing the MaxSAT weight as MAP. We measure the approximation precision only for the first three expansion steps, yielding MaxSAT weights of 853.74, 975.02, 1, 099.34 for URDF and 854.58, 937.74, 1, 069.70 for MAP, respectively. MAP does not scale to larger CNFs than in these first three expansions, while MC-SAT cannot be run beyond the first expansion step anymore.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.3">Inductively Learned Rules.</head><p>The previous experiments were run over the entire YAGO knowledge base, using 16 handcrafted rules in a fully recursive fashion. To measure the cost of deduction over such a large knowledge base with more general rules, we conduct runs over 42 recursive rules learned inductively by a variant of FOIL <ref type="bibr" target="#b20">[21]</ref>, using YAGO as background knowledge. Figure <ref type="figure" target="#fig_3">2</ref>(a) depicts the runtimes (in seconds) for grounding queries Q1-Q10 over YAGO using these rules, however breaking SLD resolution at different deduction levels.   URDF outperforms Jena even in the grounding step by a large margin, while Jena failed to respond to many queries at increasing scale factors (see also <ref type="bibr" target="#b9">[10]</ref> for detailed results on LUBM).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">RELATED WORK</head><p>Reasoning with inconsistency and uncertainty has been gaining significant attention in the Database, Semantic Web, and Machine Learning communities lately. In contrast to related works on uncertain and probabilistic databases, we consider a more dynamic way of querying and reasoning with deduction rules for intensional relations. All database approaches for uncertain data we are aware of-for example Trio <ref type="bibr" target="#b28">[29]</ref>, MayBMS <ref type="bibr" target="#b1">[2]</ref>, and MystiQ <ref type="bibr" target="#b4">[5]</ref>-limit queries to materialized data. Some systems support views <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b28">29]</ref>, but only in materialized form which is equivalent to comprehensive and thus expensive bottom-up grounding in Datalog. Moreover, we also found recent approaches that adopt inference methods for probabilistic graphical networks to database systems, such as <ref type="bibr" target="#b17">[18]</ref> or <ref type="bibr" target="#b27">[28]</ref>, to be primarily designed for batch processing and not to be well suited for interactive querying. A similar observation also holds for Probabilistic Logic Programming (PLP) and Answer Set Programming (ASP). PLP combines logic programming (including Datalog) with probabilistic reasoning. ProbLog <ref type="bibr" target="#b21">[22]</ref> employs SLD resolution <ref type="bibr" target="#b2">[3]</ref> for grounding first-order rules, together with Monte Carlo sampling or binary decision diagrams for probabilistic inference over Boolean formulas obtained from SLD proofs. ASP, on the other hand, is a more generic paradigm for solving combinatorial problems with logic programs. In <ref type="bibr" target="#b18">[19]</ref>, the authors study tractable subclasses of ASP with cardinality constraints and weights. In Probabilistic Description Logic <ref type="bibr" target="#b13">[14]</ref> (PDL) (see <ref type="bibr" target="#b14">[15]</ref> for an overview), probability ranges can be attached to subsumption (or instance-of) statements. PDL generalizes the description logic SHOIN (D) and can thus express functional rules. However, PDL cannot deal with truly inconsistent input statements. Thus, in order to apply PDL to a knowledge base with noisy confidence scores in place of the probabilities, the probability ranges would have to be reconciled upfront-which amounts to solving the inconsistencies before starting the reasoner.</p><p>Statistical Relational Learning (SRL) has been gaining a strong momentum in both the Database and Machine Learning communities, with Markov Logic Networks (MLNs) <ref type="bibr" target="#b22">[23]</ref> probably being one of the most generic approaches for combining first-order logic and probabilistic graphical models. In these classes of graphical models, sampling methods based on Markov Chain Monte Carlo (MCMC) provide a family of efficient approximation algorithms for probabilistic inference. Specifically, the algorithms employed in Markov Logic for maximum-a-posteriori (MAP) inference <ref type="bibr" target="#b23">[24]</ref> and MC-SAT <ref type="bibr" target="#b19">[20]</ref> constitute two state-of-the-art extensions to Max WalkSAT-based, stochastic MaxSAT solvers <ref type="bibr" target="#b12">[13]</ref> and Gibbs-style sampling techniques, respectively. Our approach diverges from Markov Logic in two basic aspects: grounding and inference. Grounding a first-order Markov network in an open-world semantics works by binding all entities (constants) to variables in the first-order rules that match the type signature of the respective predicates. For binary predicates, this may result in grounded networks which are often nearly quadratic in the number of entities in the knowledge base. Unlike Markov Logic, we specifically focus on query-time reasoning, with a deductive (closed-world) grounding of soft rules.</p><p>In <ref type="bibr" target="#b25">[26]</ref>, we presented a "self organizing framework for information extraction", where the main problem has been formulated also as a MaxSAT problem. The main difference between this work and <ref type="bibr" target="#b25">[26]</ref> is that here we deal with hard and soft rules in a more principled way, while in <ref type="bibr" target="#b25">[26]</ref> hard rules are enforced by introducing soft rules with large weights (see Section 2 for a discussion about why a more principled approach is needed). Moreover, we present an algorithm with an approximation guarantee of 0.83 in some cases of interest. The MaxSAT problem is well studied in the theoretical computer science community (see for example <ref type="bibr" target="#b7">[8]</ref>). Here, we focus on the effectiveness of our algorithms in solving real-life problems. In <ref type="bibr" target="#b16">[17]</ref>, we presented an initial demo of our interactive reasoning framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">CONCLUSIONS</head><p>We presented a query-time reasoning approach for uncertain RDF data and SPARQL queries over a combination of soft deduction rules and hard consistency constraints. URDF employs a generalized weighted MaxSAT algorithm that guarantees consistent query answers with regard to the hard constraints. Our experiments confirm that our MaxSAT approximation algorithm yields interactive response times over formulas with many thousands of grounded rules and facts, while empirically performing much better than the provided lower bound of 1/2 for the approximation guarantee. We also saw that, in many cases, the grounding time for the Datalogstyle soft deduction rules is the actual delimiting factor for query response times. Our future work will investigate how to further scale up inference by a combination of dynamic grounding over the highly transient parts of the knowledge base with materialized facts for the more static parts, as well as by distributing our grounding and MaxSAT-based inference strategies.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>4 :THEOREM 1 .COROLLARY 1 .</head><label>411</label><figDesc>• Remove all satisfied clauses. Given a set C of Horn clauses and a set S of hard rules, let λ be the minimum number of negated literals in any Horn clause that has at least two literals. Our algorithm is a papproximation algorithm for the MaxSAT problem with soft and hard rules, where p is obtained by solving the equation p = 1 − p λ . The running time of our algorithm is O(m•n) in the worst case, with m = c∈C |c| and n = s∈S |s|. Our algorithm has an approximation guarantee of 1/2 for general Horn clauses.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>DEFINITION 1 .</head><label>1</label><figDesc>Given a knowledge base KB = {F, C, S} and a conjunctive query Q, then: • All possible groundings fi ∈ F of the query atoms qj ∈ Q are facts in the dependency graph DQ of Q. • If a grounded fact fn ∈ F is in DQ, then all grounded facts f1, . . . , fn−1 ∈ F of all grounded soft rules C ∈ C, in which fn is the head, are also in DQ. • If a grounded fact fi ∈ F is in DQ, then all grounded facts f1, . . . , f k ∈ F of the grounded hard rule S ∈ S, which are mutually exclusive to fi, are also in DQ.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Asymptotic MaxSAT behavior (a) and MaxSAT vs. MAP and MC-SAT (b).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: SLD deduction levels (a) and LUBM results (b).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>5. 2 . 4</head><label>24</label><figDesc>LUBM Benchmark.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 2 (</head><label>2</label><figDesc>Figure 2(b) shows a comparison of URDF runtimes (in seconds, including both SLD and MaxSAT) versus the Jena Semantic Web toolkit 3 (using PostgreSQL 8.3 as storage backend) over the LUBM benchmark setting [10] at scale factors (SF) 1, 5 and 10. Performance for URDF is similar to that observed on YAGO, where MaxSAT times are again much faster than SLD grounding times.URDF outperforms Jena even in the grounding step by a large margin, while Jena failed to respond to many queries at increasing scale factors (see also<ref type="bibr" target="#b9">[10]</ref> for detailed results on LUBM).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Basic YAGO results.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>URDF/MLN</cell><cell>URDF</cell><cell>MLN</cell><cell>MLN</cell><cell>URDF</cell><cell>MLN</cell></row><row><cell></cell><cell>#C</cell><cell>|C|</cell><cell>#S</cell><cell>|S|</cell><cell>SLD(ms)</cell><cell>MaxSAT(ms)</cell><cell>MAP(ms)</cell><cell>MC-SAT(ms)</cell><cell>MaxSAT-W</cell><cell>MAP-W</cell></row><row><cell>Q1</cell><cell>49</cell><cell>109</cell><cell>22</cell><cell>25</cell><cell>243</cell><cell>1</cell><cell>80</cell><cell>65</cell><cell>19.92</cell><cell>19.92</cell></row><row><cell>Q2</cell><cell>14</cell><cell>20</cell><cell>10</cell><cell>12</cell><cell>53</cell><cell>1</cell><cell>814</cell><cell>17</cell><cell>9.69</cell><cell>9.69</cell></row><row><cell>Q3</cell><cell>32</cell><cell>40</cell><cell>27</cell><cell>28</cell><cell>25</cell><cell>7</cell><cell>814</cell><cell>17</cell><cell>20.56</cell><cell>20.56</cell></row><row><cell>Q4</cell><cell>46</cell><cell>82</cell><cell>23</cell><cell>30</cell><cell>178</cell><cell>1</cell><cell>861</cell><cell>221</cell><cell>20.77</cell><cell>20.77</cell></row><row><cell>Q5</cell><cell>176</cell><cell>203</cell><cell>167</cell><cell>167</cell><cell>3,062</cell><cell>13</cell><cell>1,564</cell><cell>60,970</cell><cell>161.93</cell><cell>161.93</cell></row><row><cell>Q6</cell><cell>318</cell><cell>342</cell><cell>307</cell><cell>310</cell><cell>584</cell><cell>4</cell><cell>111</cell><cell>173</cell><cell>292.40</cell><cell>292.40</cell></row><row><cell>Q7</cell><cell>100</cell><cell>220</cell><cell>41</cell><cell>48</cell><cell>222</cell><cell>4</cell><cell>1,344</cell><cell>1,032</cell><cell>27.06</cell><cell>27.91</cell></row><row><cell>Q8</cell><cell>195</cell><cell>199</cell><cell>192</cell><cell>193</cell><cell>93</cell><cell>7</cell><cell>1,877</cell><cell>36,330</cell><cell>188.21</cell><cell>188.21</cell></row><row><cell>Q9</cell><cell>44</cell><cell>71</cell><cell>27</cell><cell>35</cell><cell>143</cell><cell>7</cell><cell>1,407</cell><cell>142</cell><cell>26.37</cell><cell>26.37</cell></row><row><cell>Q10</cell><cell>89</cell><cell>89</cell><cell>89</cell><cell>89</cell><cell>71</cell><cell>4</cell><cell>283</cell><cell>5,267</cell><cell>86.83</cell><cell>86.83</cell></row><row><cell></cell><cell>1,063</cell><cell>1,375</cell><cell>905</cell><cell>937</cell><cell>4,674</cell><cell>49</cell><cell>9,155</cell><cell>104,234</cell><cell>853.74</cell><cell>854.58</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">http://jena.sourceforge.net/</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>2 http://alchemy.cs.washington.edu/ 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 ms URDF:MAX-SAT-0 URDF:MAX-SAT-Uni URDF:MAX-SAT-Gauss SLD Grounding 0 10,000</p><p>20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 2,312 40,112 77,912 115,712 153,512 191,312 229,112 266,912 304,712 342,512 ms |C|+|S| URDF:MAX-SAT-0 URDF:MAX-SAT-Uni URDF:MAX-SAT-Gauss SLD Grounding 100 1,000 10,000 100,000 1,000,000 ms MLN:MC-SAT MLN:MAP URDF MAX SAT 10 100 1,000 10,000 100,000 1,000,000 2,312 51,475 150,239 304,079 513,466 776,161 1,090,442 1,453,442 1,863,967 2,321,488</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">The Probabilistic Method</title>
		<author>
			<persName><forename type="first">N</forename><surname>Alon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Spencer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1992">1992</date>
			<publisher>John Wiley</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">MayBMS: Managing incomplete information with probabilistic world-set decompositions</title>
		<author>
			<persName><forename type="first">L</forename><surname>Antova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Koch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Olteanu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Contributions to the theory of logic programming</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Apt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Van Emden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. ACM</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="841" to="862" />
			<date type="published" when="1982">1982</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">DBpedia: a nucleus for a web of open data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">G</forename><surname>Ives</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="722" to="735" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MYSTIQ: a system for finding more answers by using probabilities</title>
		<author>
			<persName><forename type="first">J</forename><surname>Boulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dalvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mandhani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mathur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Suciu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD</title>
				<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="891" to="893" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Coupled semi-supervised learning for information extraction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Carlson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Betteridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">R H</forename><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WSDM</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="101" to="110" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning and inference with constraints</title>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-A</forename><surname>Ratinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rizzolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1513" to="1518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">New 3/4-approximation algorithms for the maximum satisfiability problem</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">X</forename><surname>Goemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Williamson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM J. Discr. Math</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="656" to="666" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">On the complexity of single-rule Datalog queries</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gottlob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Papadimitriou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Comput</title>
		<imprint>
			<biblScope unit="volume">183</biblScope>
			<biblScope unit="page" from="104" to="122" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">LUBM: A benchmark for OWL knowledge base systems</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heflin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="158" to="182" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">On the complexity of the maximum satisfiability problem for Horn formulas</title>
		<author>
			<persName><forename type="first">B</forename><surname>Jaumard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Simeone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing Letters</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="4" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Approximation algorithms for combinatorial problems</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Johnson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computer and System Sciences</title>
		<imprint>
			<biblScope unit="page" from="256" to="278" />
			<date type="published" when="1974">1974</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A general stochastic approach to solving problems with hard and soft constraints</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kautz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Selman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Satisfiability Problem: Theory and Applications</title>
				<imprint>
			<publisher>American Mathematical Society</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="573" to="586" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Probabilistic description logic programs</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lukasiewicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Approx. Reasoning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="288" to="307" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Managing uncertainty and vagueness in description logics for the semantic web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lukasiewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Straccia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Web Semantics</title>
		<imprint>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="291" to="308" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">FACTORIE: Probabilistic programming via imperatively defined factor graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Schultz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NPIS</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1249" to="1257" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Interactive reasoning in uncertain RDF knowledge bases</title>
		<author>
			<persName><forename type="first">T</forename><surname>Meiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dylla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theobald</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIKM</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="2557" to="2560" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS</title>
		<author>
			<persName><forename type="first">F</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Shavlik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="373" to="384" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Tractable answer-set programming with weight constraints: Bounded treewidth is not enough</title>
		<author>
			<persName><forename type="first">R</forename><surname>Pichler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rümmele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Szeider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Woltran</surname></persName>
		</author>
		<editor>KR</editor>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A general method for reducing the complexity of relational inference and its application to MCMC</title>
		<author>
			<persName><forename type="first">H</forename><surname>Poon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sumner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1075" to="1080" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Learning logical definitions from relations</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning</title>
				<imprint>
			<date type="published" when="1990">1990</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="239" to="266" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">ProbLog: A probabilistic Prolog and its application in link discovery</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Raedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kimmig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="2462" to="2467" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Markov Logic Networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Richardson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">62</biblScope>
			<biblScope unit="page" from="107" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Memory-efficient inference in relational domains</title>
		<author>
			<persName><forename type="first">P</forename><surname>Singla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Yago: A core of semantic knowledge</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>WWW</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">SOFIE: A self-organizing framework for information extraction</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sozio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>WWW</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">URDF: An efficient reasoning framework for uncertain knowledge bases with hard and soft rules</title>
		<author>
			<persName><forename type="first">M</forename><surname>Theobald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sozio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nakashole</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
		<respStmt>
			<orgName>Max-Planck-Institute for Informatics</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Scalable probabilistic databases with factor graphs and MCMC</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Wick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Miklau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="794" to="804" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Trio: A system for integrated management of data, accuracy, and lineage</title>
		<author>
			<persName><forename type="first">J</forename><surname>Widom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIDR</title>
				<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="262" to="276" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
