<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Query Answering on Expressive Datalog± Ontologies</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mostafa</forename><surname>Milani</surname></persName>
							<email>mmilani@scs.carleton.ca</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Carleton University</orgName>
								<address>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Calì</surname></persName>
							<email>andrea@dcs.bbk.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="department">Dept of Computer Science Birkbeck</orgName>
								<orgName type="institution">Univ. of London</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Oxford-Man Institute of Quantitative Finance</orgName>
								<orgName type="institution">University of Oxford</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leopoldo</forename><surname>Bertossi</surname></persName>
							<email>bertossi@scs.carleton.ca</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Carleton University</orgName>
								<address>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Query Answering on Expressive Datalog± Ontologies</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">793AE786977CAEE1BC219E8DECBFE891</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The Datalog ± family of ontology languages <ref type="bibr" target="#b0">[1]</ref>, which extends Datalog with existential quantification, has been gaining importance in the area of Ontology Querying due to its capability of capturing several prominent Semantic Web languages as well as offering efficient query answering services in many variants relevant for applications. The core feature of Datalog ± languages are so-called tuple-generating dependencies (TGDs), which are the main form of rules. Such rules allow the inference of new atoms from an initial set (a database), which is captured by the notion of chase procedure. For example, consider the TGD r(X, Y ) → ∃Z s(X, Z) and a database D constituted by a single atom r(a, b). A chase step will generate the atom s(a, ζ), where ζ is a labelled null, that is, a placeholder for an unknown value; notice that the constant b is lost in this step as it doesn't appear in the new atom. Conjunctive query answering under general TGDs is undecidable; languages of the Datalog ± family impose therefore restrictions on the form of rules so as to guarantee decidability and good computational properties. Guarded (extended by weakly-guarded ) Datalog ± were the first form of decidable Datalog ± languages, inspired by guarded logic and characterised by the presence of a guard atom in each rule, that contains all variables of that rule.</p><p>The language called sticky Datalog ± <ref type="bibr" target="#b0">[1]</ref> was introduced in order to capture a "proper" notion of join in rules, that is, the occurrence of variables in two distinct atoms of a rule in the absence of a guard for that rule. Weakly-sticky Datalog ± is an extension of sticky Datalog ± that extends sticky Datalog ± by also capturing the well-known class of weakly-acyclic TGDs.</p><p>Let Σ be the following set of rules, where some body variables are marked (denoted by a hat sign, e.g. X; see <ref type="bibr" target="#b0">[1]</ref>) as the result of a procedure that identifies positions (arguments of predicates; for instance, p <ref type="bibr" target="#b0">[1]</ref> is the position corresponding to the first argument of the predicate p) where, in the chase procedure, values can appear that can eventually be lost in a subsequent chase step.</p><formula xml:id="formula_0">v(X) → ∃Y r(X, Y ) r(X, Ŷ ), r( Ŷ , Z) → p(X, Z). p( X, Ŷ ) → ∃Z p(Y, Z) p( X, Y ), p(Y, Z) → t(Y, Z)</formula><p>A set of rules is sticky if there is no marked variable in a rule that appears more than once in the body of the rule. Intuitively, stickiness can be defined by means of the following property: during a chase step according to a rule σ, each value corresponding to a variable appearing more than once in σ is not lost in the chase step, and it is also never lost in any subsequent step involving atoms where it appears. Σ is not sticky, as easily seen.</p><p>Weak-acyclicity is defined using the notion of rank of a position <ref type="bibr" target="#b1">[2]</ref>. A position has finite (resp., infinite) rank if the number of labelled nulls that can appear in that position in the chase procedure is finite (infinite). In the Datalog <ref type="bibr" target="#b1">[2]</ref>} is the set of positions with finite rank and</p><formula xml:id="formula_1">± program Σ above, Π F = {v[1], r[1], r</formula><formula xml:id="formula_2">Π ∞ = {p[1], p[2]</formula><p>} is the set of positions with infinite rank. A Datalog ± program is weakly-acyclic if all positions of predicates have finite rank. The program Σ is not weakly-acyclic.</p><p>A set of rules is weakly-sticky if every marked variable that is repeated in the body of a rule appears at least once in a position with finite rank. This generalizes stickiness with acyclicity because the stickiness is only applied on values that appear in the positions with infinite rank. Here, Σ is weakly-sticky. Specifically in the second rule, the repeated variable Y appears in the positions r <ref type="bibr" target="#b0">[1]</ref> and r <ref type="bibr" target="#b1">[2]</ref> in Π F . In the last rule, Y is a repeated variable but not marked.</p><p>So far, no non-trivial deterministic algorithm for CQ answering has been devised for weakly-sticky Datalog ± . In this paper we set the basis for the implementation of conjunctive query (CQ) answering by the following contributions.</p><p>1. We propose a bottom-up technique, where we ground the rules of Σ according to the database D in a suitable way. The expansion instance terminates at a point dependent on the size of the query; the query can be then evaluated directly on the expanded instance. 2. We propose a hybrid approach to CQ answering as follows. First, we transform all positions in Π F into positions of rank 0 (that is, positions where only constants can appear); then certain variables in such positions are grounded, that is, the variables are replaced by selected constants of the initial instance. The obtained program is a sticky program, for which a well-known query rewriting technique for CQ answering can be applied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">A Grounding Approach</head><p>A first deterministic algorithm for query answering on a weakly-sticky Datalog ± program Σ and a database is obtained by grounding the rules of Σ in a bottomup fashion. The procedure is inspired by the version of the chase procedure in <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>. We illustrate this technique by means of an example. Consider a database D = {p(a, b), v(b)}, a Boolean conjunctive query (BCQ) q = {u(X)}, and a set of WS rules Σ as follows: σ</p><formula xml:id="formula_3">1 : p(X, Y ) → ∃Z p(Y, Z) and σ 2 : v(X), p(X, Y ), p(Y, Z) → u(Y ).</formula><p>We start from D and iteratively generate ground rules by mapping via homomorphism the body of the rules in Σ into D or the head of the rules in Σ (the current set of ground rules). The basic procedure is as follows: we iteratively add a ground rule to Σ if its head atom is not homomorphic to the head of a rule already in Σ , until no new rule can be added. This "cautious" procedure, similar to the chase procedure in <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, guarantees its termination. In our example, the algorithm stops after adding only one ground rule p(a, b) → p(b, ζ 1 ) to Σ , where ζ 1 is a labelled null.</p><p>In order to complete our grounding, we resume the above basic procedure M q times, where M q is the number of existential variables in q. Before each resumption, we freeze the labelled nulls, which after having been frozen are considered as constants. In our example, we have only one more resumption, which adds the rules, p(b,</p><formula xml:id="formula_4">ζ 1 ) → p(ζ 1 , ζ 2 ) and v(b), p(b, ζ 1 ), p(ζ 1 , ζ 2 ) → u(ζ 1 ).</formula><p>Resumptions are needed, intuitively, to capture applications of rules (in a chase procedure) where a join variable, appearing in two or more distinct atoms, are mapped to a labelled null. Now, the number of resumptions depends on the query, which makes our grounding dependent on the query; however, for practical purposes, we could ground with N resumptions so as to be able to answer queries with up to N existential variables, and if a query with more than N existential variables is to be answered, we can incrementally retake the already-computed grounding and add the required number of resumptions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">A Hybrid Approach</head><p>In this section, we propose an algorithm based on a hybrid approach that combines the grounding of specific variables in rules with the query rewriting algorithm from <ref type="bibr" target="#b2">[3]</ref>. Given a set of WS rules Σ and a database D, our algorithm transforms Σ according to D into a sticky set of rules Σ . Sticky Datalog ± enjoys first-order rewritability, that is, each CQ can be rewritten into a first-order query and directly answered on the database <ref type="bibr" target="#b2">[3]</ref>, providing the correct answer.</p><p>As an example, consider D = {v(a), s(a, b)} and a set of WS rules Σ that includes σ 1 and σ 2 from the example in Section 2 as well as the following rules:</p><formula xml:id="formula_5">σ 3 : v(X) → ∃Y r(X, Y ). σ 4 : r(X, Y ), s(X, Z) → t(Y, Z). σ 5 : r(X, Y ), t(Y, Z) → p(X, Z).</formula><p>Let us call weak rules the rules in which some repeated marked body variables appear at least once in a position with finite rank; we call such variables weak variables. We aim at grounding the weak variables only, thus turning the WS program into a sticky program.</p><p>In our example, σ 4 and σ 5 are weak rules with X and Y as their weak variables respectively. σ 4 is partially grounded as a sticky rule r(a, Y ), s(a, Z) → r(Y, Z) since X can only take the value a from D. To (partially) ground σ 5 we would need Y to be replaced with a null value invented by σ 3 . Notice that the variables we are to ground can only be in positions with finite rank, which guarantees the finiteness of the partial grounding. Our algorithm consists of two phases.</p><p>(Phase 1). We remove the existential variables that are in the positions with finite rank with a method inspired by <ref type="bibr" target="#b3">[4]</ref>. In the context of our example, we first Skolemize σ 3 , that is the only rule with an existential variable in a position with finite rank. The result is v(X) → r(X, f (X)). Next, we replace r(X, f (X)) with an expanded predicate r (X, f, X) with the higher arity that results into a rule v(X) → r (X, f, X) in which f is a fresh constant that represents the replaced function. Then, we iteratively propagate the expanded predicate in the body and head of the rules, possibly expanding other predicates e.g. t, obtaining the new rules: σ 3 : v(X) → r (X, f, X), σ 4 : r (X, Y, Y ), s(X, Z) → t (Y, Y , Z), and σ 5 : r (X, Y, Y ), t (Y, Y , Z) → p(X, Z). The result is a set of rules Σ 0,∞ = {σ 1 , σ 2 , σ 3 , σ 4 , σ 5 } with positions of rank either zero or infinite.</p><p>(Phase 2). Now, we proceed with the grounding step where we replace the weak variables (X in σ 4 and Y, Y in σ 5 ) with values from D and the new constant f. The result is a set of sticky rules Σ that includes σ 1 , σ 2 and σ 3 as well as the following rules, σ 4 : r (a, Y, Y ), s(a, Z) → t (Y, Y , Z), and σ 5 : r (X, f, a), t (f, a, Z) → p(X, Z).</p><p>To compute the answers to the query q, Σ and q are rewritten into a firstorder query using the rewriting method for sticky rules <ref type="bibr" target="#b2">[3]</ref>. The first-order query is then evaluated on D.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>Weakly-sticky Datalog ± is an expressive ontology language with good computational properties and capable of capturing the most prominent Semantic Web languages. We proposed two deterministic algorithms for answering conjunctive queries on weakly-sticky Datalog ± . This, we believe, sets the basis for practical query answering algorithms for real-world scenarios. We plan to continue our work by running experiments on large data sets. We also intend to refine the hybrid algorithm by devising solutions to reduce the number of constants of the database used to ground the weak variables; this would improve the efficiency of our approach.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Towards more expressive ontology languages: The query answering problem</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gottlob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pieris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">193</biblScope>
			<biblScope unit="page" from="87" to="128" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Data Exchange: Semantics and Query Answering</title>
		<author>
			<persName><forename type="first">R</forename><surname>Fagin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Kolaitis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Popa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TCS</title>
		<imprint>
			<biblScope unit="volume">336</biblScope>
			<biblScope unit="page" from="89" to="124" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Query Rewriting and Optimization for Ontological Databases</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gottlob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Orsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pieris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. TODS</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">25</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Extending Decidable Existential Rules by Joining Acyclicity and Guardedness</title>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rudolph</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IJCAI</title>
				<meeting>IJCAI</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="963" to="968" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Efficiently Computable Datalog ± Programs</title>
		<author>
			<persName><forename type="first">N</forename><surname>Leone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Manna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Terracina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Veltri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. KR</title>
				<meeting>KR</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="13" to="23" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+</title>
		<author>
			<persName><forename type="first">M</forename><surname>Milani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bertossi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AMW</title>
				<meeting>AMW</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
