<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Small Datalog Query Rewritings for EL</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giorgio</forename><surname>Stefanoni</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Oxford</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Boris</forename><surname>Motik</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Oxford</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ian</forename><surname>Horrocks</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Oxford</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Small Datalog Query Rewritings for EL</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2AEC16837C4AF5FB4DFAEA4CADB7EE3B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Description Logics are a key technology in data management scenarios such as Ontology-Based Data Access (OBDA), a paradigm in which a DL ontology is used to provide a conceptual view of the data <ref type="bibr" target="#b0">[1]</ref>. An OBDA system transforms a conjunctive query over the ontology into a query over the data sources <ref type="bibr" target="#b1">[2]</ref>. This transformation is independent of the data, so the OBDA approach can thus be used in settings where the data sources provide read-only access to the data, and where the data changes frequently.</p><p>Most existing OBDA systems are based on the DL-Lite family of lightweight Description Logics <ref type="bibr" target="#b0">[1]</ref>, which is also the basis for the QL profile of the OWL 2 ontology language. Logics in this family have been designed to allow a conjunctive query posed over the ontology to be rewritten as a first order query over the data sources-that is, queries are first-order rewritable. The query rewriting procedure is independent of the data, and the resulting queries can be evaluated using highly scalable relational database technology. To achieve this, however, the expressive power of DL-Lite is very restrictive. This prevents the OBDA approach from being applied in the life science domain, where many ontologies use DLs from the EL family <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. This family provides the basis for the EL profile of OWL2, and many prominent ontologies, such as SNOMED-CT, were developed using this language.</p><p>The problem of answering conjunctive queries in EL has already been studied in the literature, and two orthogonal approaches have been proposed. First, Rosati proposed a pure query rewriting technique which transforms an EL TBox T and a conjunctive query q into a Datalog program P T ,q <ref type="bibr" target="#b4">[5]</ref>. Second, Lutz et al. introduced a "combined" approach <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>. This technique first materializes certain facts entailed by the ontology in a precomputation step. Then, each user query is rewritten into a polynomial first-order query that, when evaluated over the materialized facts, computes the answers to the user's query.</p><p>Unfortunately, these two approaches exhibit several shortcomings when applied in the context of OBDA. In particular, Rosati's rewriting technique computes for each user query a fresh Datalog program whose size depends on both the query and the terminology, which could be very inefficient when dealing with large scale ontologies. The approach by Lutz et al. produces smaller first order rewritings, but the use of materialization means that the technique is only applicable when the data sources provide read/write access to the data; furthermore, materialization can be inefficient if the data changes frequently.</p><p>This work was supported by EPSRC and Alcatel-Lucent.</p><p>In this paper, we present a pure query rewriting technique to query answering in EL. Our approach reinterprets the combined approach proposed by Lutz and colleagues in terms of Datalog. Our rewriting procedure consists of two distinct steps. The first step rewrites a TBox T into a Datalog program P T , whose size depends linearly on the size of T . Then, at query time, the conjunctive query q is rewritten into a Datalog query Q P , Q C , whose size depends polynomially on q. The two rewriting steps are such that, given an ABox A, deciding whether Q P (a 1 , . . . , a k ) follows from P T ∪ Q C ∪ A is equivalent to deciding whether a 1 , . . . , a k is a certain answer to q over a knowledge base T , A .</p><p>At last, we summarize our main contributions. First, our rewriting approach, unlike Rosati's, separates the rewriting of the TBox and the query into two distinct steps, thus reducing inefficiency when dealing with large ontologies. Second, our technique does not require the materialization of entailed facts, hence our solution is in the spirit of OBDA and it avoids the problems associated with the materialization of large models. Finally, we set the stage for assessing the utility and the applicability to P T ∪ Q C ∪ A of optimized Datalog evaluation techniques, such as magic sets and SLG resolution <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. Indeed, heuristic-based evaluation strategies significantly reduce the number of facts needed to answer a query, thus potentially improving the performance of our rewriting approach. In this paper we provide only proof sketches, and refer the reader to <ref type="bibr" target="#b9">[10]</ref> for full proofs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Description Logic EL</head><p>Let N C , N R , N I be pairwise disjoint infinite sets of atomic concepts, atomic roles, and individuals. Together, the sets N C , N R , and N I form the signature of an EL language. Whenever the distinction between atomic concepts and atomic roles is immaterial, we call an element of N C ∪ N R a predicate. The set of EL concept expressions is inductively defined starting from atomic concepts A ∈ N C and atomic roles R ∈ N R according to the syntax rule:</p><formula xml:id="formula_0">C → A | C 1 C 2 | ∃R.C | .</formula><p>An EL TBox T is a finite set of concept inclusions of the form C D; an EL ABox A is a finite set of assertions of the form A(a) or R(a, b) with a and b individuals; and an EL knowledge base (KB) is a tuple K = T , A , where T is an EL TBox and A is an EL ABox. We denote with Ind(A) the set of all individuals occurring in the ABox A. Furthermore, for E either a TBox or an ABox, Pred(E) is the set of all predicates occurring in E.</p><p>Semantics is given as usual in terms of first-order interpretations I = ∆ I , • I , where ∆ I is a nonempty domain and • I is an interpretation function; please refer to <ref type="bibr" target="#b2">[3]</ref> for details. In the following, we will extensively use the notion of an unraveling of an interpretation w.r.t. an ABox. Consider an interpretation I and an ABox A over an arbitrary EL signature. A path p in I w.r.t. A is a nonempty finite sequence </p><formula xml:id="formula_1">c 1 • R 2 • c 2 • • • R n−1 • c n−1 • R n • c n such that c 1 ∈ {a I | a ∈ Ind(A)} and for all 1 ≤ i ≤ n − 1 we have that c i , c i+1 ∈ R I i+1 for R i+1 ∈ N R .</formula><formula xml:id="formula_2">∆ J = paths A (I) a J = a I A J = {p ∈ paths A (I) | tail(p) ∈ A I } R J = { a J , b J | R(a, b) ∈ A} ∪ { p, p • R • c | {p, p • R • c} ⊆ paths A (I)}</formula><p>In this paper, we deal only with normalized EL TBoxes. Let A 1 , A, and B be arbitrary concepts from N C ∪ { }. We say that an EL TBox T is in normal form if each axiom in T is in one of the following forms:</p><formula xml:id="formula_3">A B, A A 1 B, A ∃R.B, or ∃R.A B.</formula><p>Given an arbitrary EL TBox T , we can compute a normalized TBox T norm of T in linear time <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Querying EL KBs</head><p>Let N V be an infinite set of variables disjoint from N I . Together, N V and N I form the set N T of terms. A first-order query q is a first-order formula constructed from the terms in N T and the predicates from N C ∪ N R <ref type="bibr" target="#b7">[8]</ref>. In general, we write q = ψ( x) to express that q is the FO formula ψ whose answer variables are x = {x 1 , . . . , x k }. A query with k answer variables is a k-ary query. A conjunctive query (CQ) is a FO query of the form q = ∃ y.ψ( x, y), where ψ is a conjunction of unary atoms A(s) and binary atoms R(s, t) with s and t terms. The variables y are the quantified variables of q. In the following, avar(q) is the set of answer variables of q, and qvar(q) is the set of quantified variables. Finally, N V (q) is the set of all variables occurring in q, and N T (q) is the set of all terms occurring in q. Let q = ψ( x) be a k-ary FO query with x = x 1 , . . . , x k and let I be an interpretation. We say that a k-ary tuple of individuals a 1 , . . . , a k is an answer to q in I, written I |= q[a 1 , . . . , a k ], if I satisfies q under the mapping π which sets π(x i ) = a i for all 1 ≤ i ≤ k. We call π a match for q in I witnessing a 1 , . . . , a k , written I |= π q. We say that a 1 , . . . , a k is a certain answer to q over K if I |= q[a 1 , . . . , a k ], for all models I of K. We denote the set of all certain answers to q over K with cert(q, K). Rosati in <ref type="bibr" target="#b4">[5]</ref> showed that deciding whether a tuple of individuals is a certain answer to q over K is Ptime-complete w.r.t. data-complexity (i.e., w.r.t. the size of the ABox); Ptime-complete w.r.t. KB complexity (i.e., w.r.t. the size of K); and, NP-complete w.r.t. combined complexity (i.e., w.r.t. the size of both K and q).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Datalog</head><p>Let N B be a nonempty set of built-in predicates <ref type="bibr" target="#b10">[11]</ref>. Then, a Datalog rule r is an expression of the form</p><formula xml:id="formula_4">S( u) ← S 1 ( u 1 ), . . . , S n ( u n ), B n+1 ( u n+1 ) . . . , B m ( u m ),</formula><p>where n, m ≥ 0, {S, S 1 , . . . , S n } ⊆ N C ∪ N R , {B n+1 , . . . , B m } ⊆ N B , and u and u i are tuples of terms of suitable length. A rule is safe if each variable occurring in u ∪ u n+1 ∪ . . . ∪ u m also occurs in u 1 ∪ . . . ∪ u n . Atom S( u) is the head of the rule, and atoms S 1 ( u 1 ), . . . , B m ( u m ) constitute the body of the rule. Whenever the body of a rule r is empty, we call r a fact, and we write the rule as S( u). A Datalog program P is a set of safe Datalog rules. Finally, sch(P ) is the set of predicates occurring in P .</p><p>Next, we define the semantics of a Datalog program P using Herbrand interpretations <ref type="bibr" target="#b7">[8]</ref>. The Herbrand Universe of P is the set of all individuals occurring in P . The Herbrand Base of P is the set of all facts that can be constructed from the predicates in N C ∪ N R and the individuals in the universe of P . A Herbrand interpretation I of P is a subset of the Herbrand Base of P . Note that I does not interpret built-in predicates. As usual, we assume that these predicates are evaluated over a fixed, possibly infinite Herbrand interpretation B <ref type="bibr" target="#b11">[12]</ref>. Then I is a model of P w.r.t. B if, for all the rules r in P , we have that</p><formula xml:id="formula_5">I ∪ B |= ∀ x(B m ( u n ) ∧ . . . ∧ B n+1 ( u n+1 ) ∧ S n ( u n ) ∧ . . . ∧ S 1 ( u 1 ) → S( u)),</formula><p>where x is a tuple consisting of all variables occurring in the rule. The semantics of a Datalog program P is defined as the minimal Herbrand interpretation I satisfying P w.r.t. B, written M B (P ). Whenever the program does not contain built-in predicates, we do not consider the interpretation B and we simply write M(P ). The semantics of Datalog programs can be defined also by means of a fixpoint construction. Then, T P is the immediate consequence operator that maps instances I over sch(P ) to instances over sch(P ) as follows. For each rule r in P , if there exists a match π for S 1 ( u 1 ) ∧ . . .</p><formula xml:id="formula_6">∧ S n ( u n ) ∧ B n+1 ( u n+1 ) ∧ . . . ∧ B m ( u m ) in I ∪ B, then S(a 1 , . . . , a k ) is contained in T P (I) with a i = π(u i ) for each u i ∈ u.</formula><p>One can prove that T P has a minimum fixpoint T ω P such that T ω P = M B (P ) <ref type="bibr" target="#b7">[8]</ref>. Finally, a Datalog query is a tuple</p><formula xml:id="formula_7">Q P , Q C where Q P is a predicate symbol and Q C is a Datalog program. A tuple of individuals a 1 , . . . , a k is an answer to Q P , Q C over Datalog program P if P ∪ Q C |= Q P (a 1 , . . . , a k ).</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Datalog Rewriting for EL TBoxes</head><p>In this section, we show how to transform an EL TBox T into a Datalog program P T whose size depends linearly on T . The transformation is such that, for an arbitrary EL ABox A, we can use the unraveling of M(P T ∪A) to compute the answers to conjunctive queries over T , A . Let T be a TBox over an arbitrary EL signature. Intuitively, for each axiom α occurring in T , the program P T contains a set of Datalog rules which encode the constraint imposed by α. To achieve this, we have to overcome two issues.</p><p>First, EL concept inclusions of the form A ∃R.B require the use of either existential quantifications or Skolem terms in rule heads; however, Datalog does not allow neither of the two. In order to solve this issue, we use a technique that has been introduced for representing canonical models of EL knowledge bases <ref type="bibr" target="#b2">[3]</ref>. That is, for each atomic concept B occurring in T we introduce a fresh Second, EL allows for to occur in concept expressions. Hence, we need to define in P T a unary predicate , whose extension-given an ABox A-coincides with the Herbrand universe of P T ∪ A. To achieve this, we restrict our study to a subset of all EL ABoxes. In particular, we consider only those ABoxes A such that Pred(A) ⊆ Pred(T ). That is, each predicate occurring in the ABox A must occur also in the TBox T . Then, in our Datalog program, for each atomic concept A and for each atomic role R occurring in T , we add the following rules:</p><formula xml:id="formula_8">Axiom α Set of rules Θ(α) B B(X) ← A(X) A1 A2 B B(X) ← A1(X), A2(X) ∃R.A B B(X) ← R(X, Y ), A(Y ) A ∃R.B R(X, oB) ← A(X) B(oB) ← A(X)</formula><formula xml:id="formula_9">(X) ← A(X); (X) ← R(X, Y ); (Y ) ← R(X, Y ).</formula><p>This is only one of the several ways in which we can encode such a predicate. In fact, another possibility would be-as suggested by Rosati in <ref type="bibr" target="#b4">[5]</ref>-to assume that each ABox A contains an assertion (a) for each individual a ∈ Ind(A). We believe that in the context of OBDA-where the focus is to provide access to arbitrary data-sources-it is important to make as few assumptions as possible on the physical realization of the ABox. For this reason, we prefer the option presented above. Next, we formalize the transformation of a TBox T into a Datalog program P T . Let Aux = {o A | A ∈ N C } ∪ {o } be a set of auxiliary individuals distinct from N I . Then, the program P T is constructed from terms in N T ∪ Aux and predicates in N C ∪ N R ∪ { } as follows. The transformation uses the function Θ, shown in Figure <ref type="figure" target="#fig_0">1</ref>, to transform each axiom in the (normalized) TBox T into a set of Datalog rules. The Datalog program P T is then defined as follows.</p><formula xml:id="formula_10">P T = α∈T Θ(α) A∈Pred(T )∩N C (X) ← A(X) R∈Pred(T )∩N R (X) ← R(X, Y ), (Y ) ← R(X, Y )</formula><p>The following result readily follows from the definition of the program.</p><p>Proposition 1. For an arbitrary EL TBox T , Datalog program P T can be computed in time linear in the size of T .</p><p>an arbitrary EL ABox A. Next, we prove that the unraveling U w.r.t. A of M(P T ∪A) can be used to answer conjunctive queries over K = T , A . We do so in two distinct steps. First, we introduce the notion of chase of an EL knowledge base K. Second, we show that the chase of K is isomorphic to U.</p><p>The chase of an EL knowledge base K = T , A , written chase(K), is a possibly infinite Herbrand interpretation defined inductively by starting from A and then applying axioms occurring in the TBox to assertions occurring in the ABox. In our definition of the chase, we use function terms to denote existentially quantified individuals. Hence, the definition of ABox assertion is extended in a natural way to accommodate for assertions over function terms. We denote with u and w terms that can be either individuals or function terms. Next, we define an operator Γ T that chases an ABox by applying the axioms occurring in the TBox T . In the definition, we use assertions of the form (u) to assert that u is a member of the EL concept expression . For S an arbitrary ABox, Γ T (S) is the smallest ABox containing S and closed under the following chasing rules.</p><formula xml:id="formula_11">(cr1) If {A(u)} ⊆ S and A B ∈ T , then {B(u)} ⊆ Γ T (S). (cr2) If {A 1 (u), A 2 (u)} ⊆ S and A 1 A 2 B ∈ T , then {B(u)} ⊆ Γ T (S). (cr3) If {R(u, w), A(w)} ⊆ S and ∃R.A B ∈ T , then {B(u)} ⊆ Γ T (S). (cr4) If {A(u)} ⊆ S and A ∃R.B ∈ T , then {R(u, f (u, R, B)), B(f (u, R, B))} ⊆ Γ T (S). (cr5) If u occurs in S, then { (u)} ⊆ Γ T (S).</formula><p>We now define an infinite sequence of finite ABoxes A i for i ∈ N.</p><formula xml:id="formula_12">A 0 = A A i+1 = Γ T (A i )</formula><p>Finally, the chase of K is the infinite union of all such ABoxes A i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>chase(K)</head><formula xml:id="formula_13">= i∈N A i</formula><p>It is clear that our construction of the chase of K is fair. In fact, for each i ∈ N we have that A i+1 is the result of exhaustively applying-to all possible assertions occurring in A i -all applicable axioms in T . At last, we want to point out that chase(K) can be used to compute the certain answers to a CQ q over K.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proposition 2 ([5]</head><p>). Let K be an EL knowledge base. Further, let q be a k-ary conjunctive query. Then, for each k-ary tuple of individuals a 1 , . . . , a k , we have</p><formula xml:id="formula_14">a 1 , . . . , a k ∈ cert(q, K) if and only if chase(K) |= q[a 1 , . . . , a k ].</formula><p>So, by proving that the unraveling U of M(P T ∪A) is isomorphic to chase(K), we establish that U can be used to answer conjunctive queries over K. To prove the structural equivalence of U and chase(K), we define a function h mapping paths occurring in U to terms chase(K). We define h by induction on the depth of paths occurring in U as follows.</p><p>Base Case. Consider an arbitrary p ∈ ∆ U with dep(p) = 1. We set h(p) := p.</p><formula xml:id="formula_15">Inductive Step. Let p = t 1 • R 2 • t 2 • • • t n−1 • R n • t n be a path occurring in U such that h(p) has not been defined yet, but h(t 1 • • • R n−1 • t n−1 ) = u.</formula><p>We distinguish between two cases depending on the type of the individual t n .</p><p>1. If t n occurs in the ABox, we set h(p</p><formula xml:id="formula_16">) := t n . 2. If t n is of the form o B , we set h(p) := f (u, R n , B).</formula><p>Theorem 1 shows that h is an isomorphism between the two structures. Intuitively, for the only-if direction, we show that h is an injective homomorphism from U to chase(K) by induction on the depth of paths occurring in U; for the if-direction, by induction on the construction of chase(K) we prove that h is a surjective function and that it is a homomorphism from chase(K) to U. Theorem 1. Function h is an isomorphism from U to chase(K).</p><p>Since the unraveling of M(P T ∪ A) is generally infinite, this result alone does not provide us with an algorithm for answering queries in EL. In the next section, we show how to rewrite a user query q into a Datalog query</p><formula xml:id="formula_17">Q P , Q C such that P T ∪ A ∪ Q C |= Q P (a 1 , .</formula><p>. . , a k ) if and only if a 1 , . . . , a k ∈ cert(q, K) and thus solve the problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Polynomial Query Rewriting in Datalog</head><p>In the previous section, we have seen that for an arbitrary EL KB K = T , A evaluating a conjunctive query q over the unraveling of the Herbrand model of P T ∪ A is equivalent to computing the certain answers to q over K. In this section, we develop a query rewriting procedure that reduces the computation of cert(q, K) to the problem of evaluating a suitably constructed Datalog query over P T ∪ A. We achieve this in two steps. First, we present an interesting property of a certain class of interpretations. Second, we show how this result can be used to develop a query rewriting procedure in our Datalog setting.</p><p>We use the notions of A-connected and split interpretations from <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b12">13]</ref>. Let I be an interpretation and let A be an ABox over an arbitrary EL signature. We say that I is A-connected if, for each domain element c ∈ ∆ I , there exists a path p ∈ paths A (I) such that tail(p) = c. Furthermore, I is a split interpretation if, for all domain elements c, c ∈ ∆ I , we have that c ∈ {a</p><formula xml:id="formula_18">I | a ∈ Ind(A)} and c, c ∈ R I imply c ∈ {a I | a ∈ Ind(A)}. Intuitively, in an A-connected interpretation I, for each domain element c n it is always possible to find a path a I • R 2 • c 2 • • • R n • c n such that a ∈ Ind(A).</formula><p>Furthermore, if I is split, then each domain element that is not the image of an individual can be related by a role only with elements that themselves do not interpret individuals.</p><p>Then, let J be the unraveling w.r.t. A of a split and A-connected interpretation I and let q be a conjunctive query. Lutz et al. in <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b12">13]</ref> showed that it is possible to reduce the problem of answering q in J to evaluating a first-order query rewriting q * of q over I. Roughly speaking, the query rewriting q * rules some spurious answers for q in I that cannot be reproduced in J . More specifically, we have to ensure that the answer variables of q, the variables of q mapped to cyclic portions of I, and the variables of q mapped to nontree portions of I are all matched only to the domain elements in {a I | a ∈ Ind(A)}.</p><p>We now briefly outline how we can construct such an FO rewriting q * for q <ref type="bibr" target="#b5">[6]</ref>. Let ∼ q be the smallest equivalence relation over N T (q) that is closed under the following rule: if R(s, t) and R(s , t ) occur in q and t ∼ q t , then s ∼ q s . Then, for each equivalence class ζ of ∼ q , we let t ζ ∈ ζ be an arbitrary, but fixed, representative of the class. Also, for each such equivalence class ζ and for each atomic role R occurring in q, we let Pred(ζ, R) be the following set.</p><formula xml:id="formula_19">Pred(ζ, R) = {t ∈ N T (q) | R(t, t ) occurs in q with t ∈ ζ}</formula><p>Next, we define three sets of terms that correspond to the above mentioned cases.</p><formula xml:id="formula_20">-Fork = is the set of all pairs Pred(ζ, R), t ζ such that ζ is an equivalence class of ∼ q and |Pred(ζ, R)| &gt; 1.</formula><p>-Fork = is the set of all quantified variables v ∈ qvar(q) for which atoms R(s, v) and S(s , t) exist in q such that R = S and v ∼ q t. -Cyc is the set of all variables v ∈ qvar(q) for which atoms</p><formula xml:id="formula_21">R 0 (t 0 , t 0 ), . . . , R m (t m , t m ), . . . , R n (t n , t n )</formula><p>exist in q such that m, n ≥ 0; for some i ≤ n we have that t i ∼ q v; for each j &lt; n we have that t j ∼ q t j+1 ; and t n ∼ q t m . We are now ready to formally specify the FO query rewriting q * . In the definition, we assume that Aux is a fresh predicate not occurring in q and K and that every interpretation I interprets Aux as ∆ I \ {a I | a ∈ Ind(A)}. Then, formulae q 1 and q 2 are defined as follows.</p><formula xml:id="formula_22">q 1 = v∈avar(q)∪Fork = ∪Cyc ¬Aux(v) q 2 = Pred(ζ,R),t ζ ∈Fork= ¬Aux(t ζ ) ∨ t,t ∈Pred(ζ,R) (t = t )</formula><p>Finally, we set q * = q∧q 1 ∧q 2 . It turns out that q * can be computed in polynomial time w.r.t. q <ref type="bibr" target="#b5">[6]</ref>. In the same paper, Lutz et al. prove the following result. Proposition 3. Let A be an arbitrary EL ABox, let I be a split and A-connected interpretation, and let J be the unraveling of I w.r.t. A. Then, for every k-tuple of individuals a 1 , . . . , a k , we have that</p><formula xml:id="formula_23">I |= q * [a 1 . . . , a k ] if and only if J |= q[a 1 . . . , a k ].</formula><p>This result applies to our Datalog rewriting of EL TBoxes. Indeed, for an arbitrary EL KB K = T , A , we have that M(P T ∪A) is a split and A-connected interpretation. The intuition behind the argument is as follows. We show that M(P T ∪ A) is split by noticing that rules encoded in P T do not allow for the derivation of facts of the form R(o B , a) for a ∈ Ind(A) and o B ∈ Aux. To see that M(P T ∪ A) is A-connected, we just recall that M(P T ∪ A) is minimal and, hence, all the derived facts must be "grounded" w.r.t. the facts in A.</p><p>Theorem 2. Let K = T A be an EL knowledge base. Then, M(P T ∪ A) is a split and A-connected interpretation.</p><p>Hence, for an arbitrary k-ary CQ q and for each k-tuple of individuals a 1 , . . . , a k , we have that M(P T ∪ A) |= q * [a 1 . . . , a k ] if and only if a 1 , . . . , a k ∈ cert(q, K).</p><p>Note that q * is a first-order query, and we are unaware of systems capable of evaluating first-order queries over Datalog programs. Therefore, we next show how to transform q * into a Datalog query Q P , Q C such that a 1 , . . . , a k ∈ cert(q, K) if and only if P T ∪ A ∪ Q C |= Q P (a 1 . . . , a k ). We construct such a Datalog query Q P , Q C by applying to q * a simplified version of the Lloyd-Topor transformation <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>. Definition 1 (Datalog Rewriting). Let q( x) be a k-ary CQ whose quantified variables are among y; let Cyc, Fork = , and Fork = be as specified above; let</p><formula xml:id="formula_24">Pred(ζ 1 , R 1 ), t 1 ζ , . . ., Pred(ζ n , R n ), t n</formula><p>ζ be an arbitrary enumeration of Fork = ; let p 0 , p 1 , . . . , p n be fresh predicates; and let Named be a built-in with a predetermined, possibly infinite Herbrand interpretation N = {Named(a) | a ∈ N I }. Query Q C then contains the following safe Datalog rules:</p><formula xml:id="formula_25">p 0 ( x, y) ← q, v∈avar(q)∪Fork = ∪Cyc Named(v)<label>(1)</label></formula><formula xml:id="formula_26">p i ( x, y) ← p i−1 ( x, y), Named(t i ζ ) for 1 ≤ i ≤ n<label>(2)</label></formula><formula xml:id="formula_27">p i ( x, y) ← p i−1 ( x, y), t,t ∈Pred(ζ i ,R i ) t = t for 1 ≤ i ≤ n<label>(3)</label></formula><formula xml:id="formula_28">Q P ( x) ← p n ( x, y)<label>(4)</label></formula><p>One may think that the recursive definition of predicates p i for 1 ≤ i ≤ n could be simplified by writing Q P ( x) ← p 0 ( x, y) . . . p n ( x, y) and by defining each p i as:</p><formula xml:id="formula_29">p i ( x, y) ← Named(t i ζ ) p i ( x, y) ← t,t ∈Pred(ζ i ,R i ) t = t</formula><p>Unfortunately, these rules are not safe. Safe rules, on the one hand, provide us with a clear and unambiguous semantics. On the other hand, unsafe rules are also computationally more expensive for bottom-up computation, since each variable in the head may be bound to an arbitrary individual in the universe of the program. For this reason, we prefer our, slightly more involved, solution.</p><p>Proposition 4. For an arbitrary k-ary conjunctive query q, query Q P , Q C can be computed in polynomial time w.r.t. the size of q.</p><p>Proof. We note that ∼ q can be computed in polynomial time w.r.t. the size of q [6] and, therefore, also the sets Cyc, Fork = , and Fork = can be computed in polynomial time w.r.t. q. Furthermore, the size of the body of rule p 0 ( x, y) depends linearly on the size of q, Cyc, and Fork = . Also, for each pair Pred(ζ, R), t ζ in Fork = , the program Q C contains exactly two rules. The size of these two rules depends linearly on the size of Pred(ζ, R), t ζ . Thus, we conclude that Q P , Q C can be computed in polynomial time with respect to the size of q.</p><p>[10], we prove that our rewriting is correct-that is, that answering Q P , Q C over P T ∪ A is equivalent to computing cert(q, T , A). This follows from Proposition 3 and the fact that our Datalog query is the result of transforming the FO rewriting q * along the lines of the Lloyd-Topor transformation. Theorem 3. Let K be an EL knowledge base and let q be a k-ary CQ over K. Then, for every k-tuple of individuals a 1 , . . . , a k , we have that a 1 , . . . , a k ∈ cert(q, K) if and only if</p><formula xml:id="formula_30">P T ∪ A ∪ Q C |= Q P (a 1 , . . . , a k ).</formula><p>Finally, we investigate the complexity of our rewriting procedure. Theorem 4. Let K = T , A be an EL KB, let q be a k-ary CQ, and let a 1 , . . . , a k be a tuple of individuals. We can decide P T ∪A∪Q C |= Q P (a 1 , . . . , a k ) in polynomial time w.r.t. the size of K and in non-deterministic polynomial time with respect to the size of both K and q.</p><p>Proof. We have already argued that the size of Datalog program P T depends linearly on the size of the TBox T and that the Datalog rewriting Q P , Q C can be computed in Ptime w.r.t. q. Also, we note that the arity of predicates and the number of variables occurring in P T ∪ A ∪ Q C do not depend on K. Finally, from an implementation point-of-view (as suggested in <ref type="bibr" target="#b11">[12]</ref>), the built-in predicate Named can be considered as an assertion in the ABox A with a different physical realization: it is not directly stored in the ABox but it is implemented as a procedure which is evaluated during the execution of the program. Clearly, such a procedure can be implemented to run in time polynomial in K. It follows that we can compute the minimal Herbrand model of P T ∪ A ∪ Q C in time polynomial in the size of K <ref type="bibr" target="#b7">[8]</ref>. The membership in NP follows directly from the considerations above and from the fact that we can guess and check in nondeterministic polynomial time a match π for Q P in M(P T ∪ A ∪ Q C ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>In this paper, we introduce a query rewriting approach to query answering in EL. In our approach, the process of computing the certain answers to a CQ q over an EL KB K = T , A is divided into two distinct steps. A first preprocessing step in which the TBox T is transformed into a Datalog program P T , whose size is linear in T . Then, at query time, the query q is independently rewritten into a Datalog query Q P , Q C , whose size is polynomial in q. Finally, computing cert(q, K) amounts to evaluating the Datalog query Q P , Q C over P T ∪ A.</p><p>In future, we plan to extend our approach to deal with ELH dr ⊥ . Lutz et al. have already proposed a combined approach for query answering in this DL <ref type="bibr" target="#b12">[13]</ref>. However, differently from their solution, we would like the Datalog rewriting Q P , Q C to be independent from the role inclusions contained in the TBox. Additionally, we plan to extend our work to cover nominals, which raises the question on how to efficiently handle equality in Datalog <ref type="bibr" target="#b1">[2]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Transformation of EL Axioms into Rules.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>We say that a path p has depth n and we write dep(p) = n; furthermore, tail(p) is the last domain element c n in p. Let paths A (I) denote the set of all paths w.r.t. A occurring in I. The unraveling J of I w.r.t. A is the following interpretation.</figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Ontologies and databases: the DL-Lite approach</title>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rodríguez-Muro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic Technologies for Informations Systems -5th Int. Reasoning Web Summer School (RW</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Tessaris</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Franconi</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="volume">5689</biblScope>
			<biblScope unit="page" from="255" to="356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Tractable query answering and rewriting under description logic constraints</title>
		<author>
			<persName><forename type="first">H</forename><surname>Pérez-Urbina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Applied Logic</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="186" to="209" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Pushing the EL envelope</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI, Professional Book Center</title>
				<editor>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Kaelbling</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Saffiotti</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="364" to="369" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Pushing the EL envelope further</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the OWLED 2008 DC Workshop on OWL: Experiences and Directions</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</editor>
		<meeting>the OWLED 2008 DC Workshop on OWL: Experiences and Directions</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On conjunctive query answering in EL</title>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Description Logics</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Franconi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Haarslev</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Turhan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Tessaris</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">250</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Conjunctive query answering in EL using a database system</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Toman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the OWLED 2008 Workshop on OWL: Experiences and Directions</title>
				<meeting>the OWLED 2008 Workshop on OWL: Experiences and Directions</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">The combined approach to ontology-based data access</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kontchakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Toman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zakharyaschev</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>IJCAI, AAAI Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Abiteboul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vianu</surname></persName>
		</author>
		<title level="m">Foundations of databases</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Query evaluation under the well-founded semantics</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Warren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. PODS &apos;93</title>
				<meeting>the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. PODS &apos;93<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="168" to="179" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Small datalog query rewriting for EL</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stefanoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<ptr target="http://www.cs.ox.ac.uk/people/giorgio.stefanoni/pubs/2012/tr/dl2012.pdf" />
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
		<respStmt>
			<orgName>University of Oxford</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Ullman</surname></persName>
		</author>
		<title level="m">Principles of database and knowledge-base systems</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Computer Science Press, Inc</publisher>
			<date type="published" when="1988">1988</date>
			<biblScope unit="volume">I</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">What you always wanted to know about datalog (and never dared to ask)</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ceri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gottlob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tanca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Knowl. Data Eng</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="146" to="166" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Conjunctive query answering in the description logic EL using a relational database system</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Toman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Boutilier</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="2070" to="2075" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Making prolog more expressive</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lloyd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Topor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Logic Programming</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="225" to="240" />
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Semantic web methods for knowledge managmement</title>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
		<respStmt>
			<orgName>University of Karlsruhe</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
