<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Datalog Rewriting Techniques for Non-Horn Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mark Kaminski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yavor Nenov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We study the closely related problems of rewriting disjunctive datalog programs and non-Horn DL ontologies into plain datalog programs that entail the same facts for every dataset. We rst propose the class of markable disjunctive datalog programs, which is e ciently recognisable and admits polynomial rewritings into datalog. Markability naturally extends to SHI ontologies, and markable ontologies admit (possibly exponential) datalog rewritings. We then turn our attention to resolution-based rewriting techniques. We devise an enhanced resolution rewriting procedure for disjunctive datalog, and propose a second class of SHI ontologies that admits exponential datalog rewritings via resolution. Finally, we evaluate the feasibility of our techniques over a large corpus of ontologies, with encouraging results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Query answering over DL ontologies is a key reasoning problem for many
applications. Query answering can sometimes be implemented via rewriting into
datalog, where a rewriting of a query Q w.r.t. an ontology O is a datalog program
P that preserves the answers to Q for any dataset. Rewriting queries into
datalog not only ensures tractability in data complexity|an important requirement
in data-intensive applications|but also enables the reuse of scalable rule-based
reasoners such as OWLim [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Oracle's Data Store [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and RDFox [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Datalog rewriting techniques have been investigated in depth for Horn DLs,
and optimised algorithms have been implemented in systems such as Requiem
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Clipper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and Rapid [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Techniques for non-Horn DLs, however, have
been studied to a lesser extent. Atomic queries were shown datalog rewritable
w.r.t. DL-Litebool ontologies [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ]. In contrast, datalog rewritings may not exist
for ontologies expressed in the basic non-Horn DL E LU [
        <xref ref-type="bibr" rid="ref12 ref5">12, 5</xref>
        ], and su cient
conditions that ensure rewritability were identi ed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Datalog rewritability
of atomic queries w.r.t. SHI ontologies was proved NExpTime-complete in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Our focus is on atomic queries. In this setting, rewritability for ontologies is
strongly related to the rewritability of disjunctive datalog programs into datalog:
every SHIQ ontology can be transformed into a disjunctive program that entails
the same facts for each dataset (thus preserving answers to atomic queries) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we proved a characterisation of datalog rewritability for disjunctive
programs based on linearity: a restriction that requires each rule to contain at
most one IDB atom in the body. We showed that every linear disjunctive
program can be polynomially rewritten into plain datalog; conversely, every datalog
program can be polynomially translated into an equivalent linear disjunctive
datalog program. Motivated by this characterisation, we proposed weakly linear
disjunctive datalog : a language that extends both datalog and linear disjunctive
datalog, and which admits polynomial datalog rewritings. In a weakly linear
program, the linearity requirement is relaxed: instead of applying to all IDB
predicates, it applies only to those that \depend" on a disjunctive rule.
      </p>
      <p>
        A di erent approach to rewriting disjunctive programs into datalog by means
of a resolution-based procedure was proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The procedure works by
saturating the input disjunctive program P such that in each resolution step at least
one of the premises is a non-Horn rule; if this process terminates, the procedure
outputs the subset of datalog rules in the saturation, which is guaranteed to be
a rewriting of P. The procedure was shown to terminate for so-called simple
disjunctive programs; furthermore, it was shown that DL-Litebool ontologies can
be transformed into disjunctive programs that satisfy the simplicity condition.
      </p>
      <p>
        In this paper, we present several improvements over existing techniques, and
evaluate their feasibility in practice over a large corpus of non-Horn ontologies.
In Section 3, we propose the class of markable disjunctive datalog programs, in
which the weak linearity condition from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is further relaxed. We show that our
extended class of programs is e ciently recognisable and that each markable
program admits a polynomial datalog rewriting. These results can be readily applied
to ontology reasoning. We rst consider the \intersection" between OWL 2 and
disjunctive datalog (which we call RLt), and show that fact entailment over RLt
ontologies corresponding to a markable program is tractable in combined
complexity (and hence no harder than in OWL 2 RL). We then lift the markability
condition to ontologies, and show that markable SHI-ontologies admit a
(possibly exponential) datalog rewriting. In Section 4, we re ne the resolution-based
rewriting procedure from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by further requiring that only atoms involving
disjunctive predicates can participate in resolution inferences. This re nement can
signi cantly reduce the number of inferences drawn during saturation, without
a ecting correctness. We then turn our attention to ontologies, and propose an
extension of the logics in the DL-Litebool family that admits (possibly
exponential) datalog rewritings. Our empirical results show that many realistic non-Horn
ontologies can be rewritten into datalog. Furthermore, we have tested the
scalability of query answering over the programs obtained using our techniques, with
promising results. Proofs are delegated to a technical report [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>We consider standard notions of terms, atoms, literals, formulae, sentences, and
entailment. A fact is a ground atom and a dataset is a nite set of facts. We
assume that equality is an ordinary predicate and that each set of
formulae contains the axiomatisation of as a congruence relation for its signature.
Clauses, substitutions, most general uni ers (MGUs), clause subsumption,
tau1: dn</p>
      <p>i=1 Ai v Fjm=1 Cj
2: 9R:A v B
3: A v Self(R)
4: Self(R) v A
5: R v S
6: R v S
7: R S v T
8: A v
9: A v
m R:B
m R:B</p>
      <p>
        Vin=1 Ai(x) ! Wjm=1 Cj(x)
R(x; y) ^ A(y) ! B(x)
A(x) ! R(x; x)
R(x; x) ! A(x)
R(x; y) ! S(x; y)
R(x; y) ! S(y; x)
R(x; z) ^ S(z; y) ! T (x; y)
A(x) ! 9 my:(R(x; y) ^ B(y))
A(z) ^ Vim=0 R(z; xi) ^ B(xi) ! W0 i&lt;j m xi
xj
tologies, binary resolution, and factoring are as usual [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Clause C -subsumes
D if C subsumes D and C has no more literals than D. Clause C is redundant in
a set of clauses if C is tautological or if C is -subsumed by another clause in the
set. A condensation of a clause C is a minimal subset that is subsumed by C.
      </p>
      <p>A rule r is a function-free sentence 8x8z:['(x; z) ! (x)] where tuples of
variables x and z are disjoint, '(x; z) is a conjunction of distinct equality-free
atoms, and (x) is a disjunction of distinct atoms. Formula ' is the body of r,
and is the head. Quanti ers in rules are omitted. We assume that rules are safe.
A rule is datalog if (x) has at most one atom, and it is disjunctive otherwise. A
program P is a nite set of rules; it is datalog if it consists only of datalog rules,
and disjunctive otherwise. We assume that rules in P do not share variables.
For convenience, we treat &gt; and ? in a non-standard way as a unary and a
nullary predicate, respectively. Given a program P, P&gt; is the program with a
rule Q(x1; : : : ; xn) ! &gt;(xi) for each predicate Q in P and each 1 i n, and
a rule ! &gt;(a) for each constant a in P. We assume that P&gt; P and &gt; does
not occur in head position in P n P&gt;. We de ne P? as consisting of a rule with
? as body and empty head. We assume P? P and no rule in P n P? has an
empty head or ? in the body. Thus, P [ D j= &gt;(a) for every a in P [ D, and
P [ D is unsatis able i P [ D j= ?. Head predicates in P n P&gt; are intensional
(or IDB ) in P. All other predicates (including &gt;) are extensional (EDB ). An
atom is intensional (extensional) if so is its predicate. A rule is linear if it has
at most one IDB body atom. A program P is linear if all its rules are.</p>
      <p>We assume familiarity with DLs. W.l.o.g. we consider normalised axioms as
in Table 1. An ontology O is SHIQ if each axiom of type 7 satis es R = S = T ;1
it is SHI if it is SHIQ, it does not contain axioms of type 9, and each axiom of
type 8 satis es m = 1; it is ALCHI if it is SHI and it has no axiom of type 7; it
is RLt if it does not contain axioms of type 8, and it is RL if it is RLt and m = 1
for each axiom of type 1 and 9. Programs obtained from RLt ontologies have
rules with bounded number of variables: fact entailment is PTime-complete for
RL and co-NP-complete for RLt (in combined complexity).2
1 SHIQ enforces additional restrictions to ensure decidability, which we omit here.
2 RLt and RL allow for nominals, which we omit. All our results immediately extend.</p>
      <p>P0 = fC(x) ! B(x) _ G(x)</p>
      <p>G(y) ^ E(x; y) ! B(x)
B(y) ^ E(x; y) ! G(x)
E(y; x) ! E(x; y) g
(2)
(3)
(4)</p>
      <p>C
(1)
G
(3)
(2)
(3)</p>
      <p>E
(4)</p>
      <p>
        An (atomic) query is a function-free atom. A (disjunctive) program P is a
rewriting of F w.r.t. a set of predicates S if for each dataset D over the signature
of F and every fact over S we have F [ D j= i P [ D j= . Program P is
a rewriting of F if it is a rewriting w.r.t. the set of all predicates in F , in which
case it preserves answers to all atomic queries. Hudstadt et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] developed an
algorithm for transforming a SHIQ ontology into a disjunctive program that
preserves entailment of facts over non-transitive relations. This technique was
extended in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to preserve all facts. Thus, every SHIQ ontology O admits a
disjunctive datalog rewriting DD(O), which can be of exponential size.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Datalog Rewritings Based on Linearity</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] we proposed the class of weakly linear programs (WL), which extends both
datalog and linear disjunctive datalog. In a WL program predicates are
partitioned into disjunctive (i.e., those whose extension may depend on a disjunctive
rule) and datalog (those that depend solely on datalog rules). A program is WL
if all rules have at most one occurrence of a disjunctive predicate in the body.
De nition 3.1. The dependency graph GP = (V; E; ) of a program P is the
smallest edge-labeled digraph such that:
      </p>
      <sec id="sec-3-1">
        <title>1. V contains every predicate occurring in P;</title>
        <p>2. r 2 (P; Q) whenever P; Q 2 V , r 2 P n P&gt;, P occurs in the body of r, and</p>
        <p>Q occurs in the head of r; and
3. (P; Q) 2 E whenever (P; Q) is nonempty.</p>
        <p>A predicate Q depends on a rule r 2 P if GP has a path that ends in Q and
involves an r-labeled edge. Predicate Q is datalog if it only depends on datalog
rules; otherwise, Q is disjunctive. Program P is weakly linear (WL for short)
if each rule body in P has at most one occurrence of a disjunctive predicate.</p>
        <p>Consider the disjunctive program P0 and its dependency graph depicted in
Figure 1. Predicate C is EDB, predicates B and G depend on Rule (1) and hence
are disjunctive, whereas E depends only on Rule (4) and hence it is datalog. Each
rule has at most one disjunctive body atom and the program is WL.</p>
        <p>
          WL programs admit a polynomial rewriting [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Roughly speaking, they are
translated into datalog by \moving" all disjunctive body atoms to the head and
all disjunctive head atoms to the body while replacing their predicates with fresh
ones of higher arity; the new predicates are \initialised" using additional rules.
Markable Programs We next propose the class of markable disjunctive
datalog programs, which extends WL programs. A key feature of a markable program
is that one can identify a subset of disjunctive predicates, called marked
predicates, such that the program can be translated into datalog by \moving" only
those disjunctive atoms in a rule whose predicates are marked.
        </p>
        <p>De nition 3.2. Let P be a disjunctive program. A marking of P is a set M of
disjunctive predicates in P such that:
1. Every rule in P has at most one body atom Q(t) with Q 2 M .
2. Every rule in P has at most one head atom Q(t) with Q 2= M .
3. If Q 2 M and P is reachable from Q in GP , then P 2 M .</p>
        <p>A predicate Q is marked by M if Q 2 M . An atom is marked if so is its predicate.
A disjunctive program is markable if it has a marking.</p>
        <p>Markability generalises weak linearity in the following sense.</p>
        <p>Proposition 3.3. A disjunctive program P is WL if and only if the set of all
disjunctive predicates in P constitutes a marking of P.</p>
        <p>Let P1 extend P0 with the following rules:</p>
        <p>V (x) ! C(x) _ U (x)
(5)</p>
        <p>C(x) ^ U (x) ! ?
(6)</p>
        <p>The dependency graph is given next. Note that C, U , B, and G are disjunctive
as they depend on Rule (5). Thus, (6) has two disjunctive body atoms and P1
is not WL. The program, however, has markings fC; B; Gg and fU; B; Gg.</p>
        <p>U
(5)
(6)</p>
        <p>V
?
(5)
(6)</p>
        <p>C
(1)
(1)</p>
        <p>B
G
(2)
(3)
(2)
(3)</p>
        <p>E
(4)</p>
        <p>
          The following proposition establishes that checking markability is tractable
and amenable to e cient implementation via reduction to 2-SAT.
Proposition 3.4. Markability can be checked in polynomial time.
Datalog Rewritability of Markable Programs We now show that markable
programs are rewritable into datalog by means of a quadratic translation M ,
which extends the translation for weakly linear programs given in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Consider P1 and the marking M = fB; G; U g. We introduce fresh binary
predicates BY , GY , and U Y for every disjunctive predicate Y . Intuitively, if
a fact BG(c; d) holds in M (P1) [ D then proving B(c) su ces for proving
G(d) in P1 [ D (or, in other words, we have P1 [ D j= B(c) ! G(d)).
Analogously, for the unmarked disjunctive predicate C we introduce fresh binary
predicates CY for each disjunctive predicate Y ; these predicates have a di erent
intuitive interpretation: if a fact CU (c; d) holds in M (P1) [ D then P1 [ D
implies C(c) _ U (d). To \initialise" the extension of the fresh predicates we
need the following rules for every X 2 M and every disjunctive predicate Y .</p>
        <p>These rules encode the intended meaning of the auxiliary predicates. For
example, Rule (8) states that if X(c) holds for some constant c and this is
su cient to prove Y (d) for some d, then Y (d) holds. The key step is to \ ip"
the direction of all rules in P1 involving the marked predicates B, G and U by
moving all marked atoms from the head to the body and vice versa while at the
same time replacing their predicates with the relevant auxiliary predicates. Thus,
Rule (2) leads to the following rules in M (P1) for each disjunctive predicate Y :
These rules are natural consequences of Rule (2) under the intended meaning of
the auxiliary predicates: if we can prove a goal Y (z) by proving rst B(x), and
E(x; y) holds, then by Rule (2) we deduce that proving G(y) su ces to prove
Y (z). In contrast to (2), Rule (1) contains no disjunctive body atoms. We \ ip"
this rule as follows, for each disjunctive predicate Y :
Similarly to the previous case, this rule follows from Rule (1): if C(x) holds and
we can establish that Y (y) can be proved from B(x) and also from G(x), then
Y (y) must hold. In contrast to marked atoms, unmarked atoms are not moved.
So, Rules (5) and (6) yield the following rules for each disjunctive predicate Y :
And indeed, these rules are consequences of Rule (5) and (6), respectively, under
the intended meaning of the auxiliary predicates: V (x) and U (x) ! Y (y) imply
C(x) _ Y (y) by Rule (5), while C(x) _ Y (y) and U (x) imply Y (y) by Rule (6).
De nition 3.5. Let P be a disjunctive program, the set of disjunctive
predicates in P n P&gt;, and M a marking of P. For each (P; Q) 2 2, let P Q</p>
        <p>Q
and P be fresh predicates of arity arity(P ) + arity(Q). Then, M (P) is the
datalog program with the rules given next, where ' is the conjunction of all datalog
atoms in a rule, '&gt; is the least conjunction of &gt;-atoms that makes a rule safe,
all predicates Pi, Qj are in , and y; z are disjoint vectors of fresh variables:</p>
      </sec>
      <sec id="sec-3-2">
        <title>1. every rule in P that contains no disjunctive predicates;</title>
        <p>2. a rule '&gt; ^ ' ^ Vjm=1 QjR(tj ; y) ^ Vin=1 P iR(si; y) ! QR(t; y) for every rule
r = ' ^ Q(t) ^ Vjm=1 Qj (tj ) ! Win=1 Pi(si) 2 P n P&gt; and every R 2 , where
Q(t) is the unique marked body atom of r (and hence all Pi(si) are marked);
3. a rule '&gt; ^ ' ^ Vjm=1 QjR(tj ; y) ^ Vin=1 P iR(si; y) ! R(y) for every rule
r = ' ^ Vjm=1 Qj (tj ) ! Win=1 Pi(si) 2 P n P&gt; and each R 2 , where r has
no marked body atoms and no unmarked head atoms;</p>
        <p>The transformation is quadratic and the arity of predicates is at most doubled.
For P1 and the marking M , we obtain the datalog program M (P1) consisting
of the following rules, where X 2 M and Y is disjunctive:</p>
        <p>Theorem 3.6. Let P be a disjunctive program and let M be a marking of P.
Then M (P) is a polynomial datalog rewriting of P.</p>
        <p>
          Being a rewriting of P, M (P) preserves the extension of all predicates in
P. If we only want to query a speci c predicate Q, we can compute a smaller
program, which is linear in the size of P and preserves the extension of Q. If Q
is datalog, each proof in P of a fact about Q involves only datalog rules, and
if Q is disjunctive each such proof involves only fresh predicates XQ and XQ.
Thus, in M we can dispense with all rules involving auxiliary predicates XR
or XR for R 6= Q (if Q is datalog the rewriting has no auxiliary predicates).
Theorem 3.7. Let P be a disjunctive program, M a marking of P, S a set
of predicates in P, and P0 obtained from M (P) by removing all rules with a
predicate XR or XR for R 2= S. Then P0 is a rewriting of P w.r.t. S.
Rewriting Ontologies Our results are directly applicable to RLt. In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], we
showed tractability of fact entailment for the class of RLt ontologies
corresponding to WL programs. We now extend this result to markable programs.
Theorem 3.8. Checking O [ D j= , for O an RLt ontology that corresponds
to a markable program, is PTime-complete w.r.t. data and combined complexity.
        </p>
        <p>We next lift the markability condition from disjunctive programs to
ontologies. Observe that the notions of dependency graph and markability naturally
extend to sets of rst-order clauses (written as rules where function symbols are
allowed). We de ne a predicate to be disjunctive in O if it is disjunctive in the
set FO of clauses obtained by skolemisation; we call O markable if so is FO; and
we call a set of predicates a marking of O if it is a marking of FO.
Example 3.9. Consider the ontology O1 and its corresponding clauses FO1 :
O1 =fPerson v Man t Woman; Person v 9child:Person;</p>
        <sec id="sec-3-2-1">
          <title>9married:Person v Person; Woman v Person; Man v Persong</title>
          <p>FO1 =fPerson(x) ! Man(x) _ Woman(x); Person(x) ! child(x; f (x));
Person(x) ! Person(f (x)); Person(y) ^ married(x; y) ! Person(x);</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Woman(x) ! Person(x); Man(x) ! Person(x)g</title>
          <p>Ontology O1 is markable since the set fPerson; Man; Womang is a marking of FO1 .</p>
          <p>
            As already mentioned, every normalised SHI ontology can be rewritten into
disjunctive datalog by means of a resolution-based calculus [
            <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
            ]. The following
lemma establishes that binary resolution and factoring preserve markability.
Lemma 3.10. Let M be a marking of a set of clauses F , and let F 0 be obtained
from F using binary resolution and factoring. Then M is a marking of F 0.
          </p>
          <p>
            Thus, markable SHI ontologies admit a (possibly exponential) rewriting.
Theorem 3.11. Let O be a SHI ontology and let M be a marking of O. Let
DD(O) be the disjunctive datalog rewriting of O as in [
            <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
            ]. Then M is a marking
of DD(O) and M (DD(O)) is a datalog rewriting of O.
          </p>
          <p>Corollary 3.12. Checking O [ D j= , for O a markable SHI ontology is
PTime-complete w.r.t. data and in ExpTime w.r.t. combined complexity.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Resolution-Based Rewritings</title>
      <p>
        Resolution provides an alternative technique for rewriting disjunctive programs
into datalog [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Procedure 1 saturates the input program P under binary
resolution and positive factoring, with the restriction that two Horn clauses are
never resolved together. The procedure is compatible with redundancy
elimination techniques such as tautology elimination, subsumption and condensation.
If it terminates, the procedure returns the subset of Horn clauses (equivalently,
datalog rules) in the saturation, which is guaranteed to be a rewriting of P.
      </p>
      <p>We show that the separation between disjunctive and datalog predicates
(Definition 3.1) can be exploited to re ne this procedure. The idea is to further re ne
resolution by ensuring that the resolved atoms involve a disjunctive predicate.
De nition 4.1. Compile-Horn-Restricted is obtained from Procedure 1 by adding
to the de nition of R in step 5 the additional restriction that the predicate in
the atoms being resolved must be disjunctive in S.
Correctness of Compile-Horn-Restricted relies on the observation that resolutions
on datalog predicates can always be delegated to the datalog reasoner and hence
do not have to be performed as part of the rewriting process.</p>
      <p>Theorem 4.2. If Compile-Horn-Restricted terminates on a disjunctive program
P with a program P0, then P0 is a datalog rewriting of P.</p>
      <p>
        The class of disjunctive programs over which Compile-Horn-Restricted
terminates is incomparable with the class of markable programs. Furthermore, the
rewritings produced by both approaches are also quite di erent. Markable
programs lead to polynomial rewritings, in which the arity of predicates is increased;
rewritings computed via resolution can be much larger, but since all the datalog
rules in the rewriting are logically entailed by the original program, the arity of
predicates stays the same. In Section 5 we will discuss practical implications.
Rewriting Ontologies The procedure Compile-Horn was shown to terminate
for a class of programs called simple [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; furthermore, DL-LitebHo;o+l ontologies are
transformed into disjunctive programs that satisfy the simplicity condition using
the algorithm by Hustadt, Motik and Sattler [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We now extend this result
by devising a su cient condition for datalog rewritability of SHI ontologies
via Compile-Horn-Restricted. Since transitivity axioms can be eliminated from
SHI ontologies by a polynomial transformation while preserving fact entailment
(see [
        <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
        ]), it su ces to formulate our condition for ALCHI.3 First, we adapt
the notion of simple rules in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as follows.
      </p>
      <p>De nition 4.3. An axiom of the form 9R:A v B is simple w.r.t. a set of
predicates S (or S-simple) if A 2= S. An ontology O is S-simple if so is every
axiom of the form 9R:A v B in O.
3 Note that neither Compile-Horn nor Compile-Horn-Restricted are well-suited for
dealing with (axiomatised) equality. Both will diverge on every disjunctive program with
equality due to the congruence axioms P (x) ^ x y ! P (y) with P disjunctive.</p>
      <p>
        Note that ontology O1 from Example 3.9 is not simple w.r.t. its disjunctive
predicates due to axiom 9married:Person v Person. If, however, we replace this
axiom with Man u Woman ! ?, we obtain a simple ontology, which in turn is
no longer markable. The following theorem then generalises the result in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to
a su cient condition for datalog rewritability of ALCHI ontologies.
Theorem 4.4. Let O be an ALCHI ontology that is simple w.r.t. its disjunctive
predicates, and let DD(O) be the disjunctive datalog rewriting of O as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Compile-Horn-Restricted terminates on DD(O) with a datalog rewriting of O.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>
        We have evaluated whether realistic ontologies can be rewritten into datalog
using our approaches. We analysed 118 ontologies that use disjunctive constructs
from BioPortal, the Protege library, and the corpus in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. To transform
ontologies into disjunctive datalog we used KAON2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which succeeded to compute
disjunctive programs for 103 ontologies.4 Out of the 103 disjunctive programs,
32 were WL, and 35 were markable. Furthermore, 26 programs could be
rewritten using Compile-Horn, and 27 could be rewritten using Compile-Horn-Restricted
(within 1min). Furthermore, programs produced by Compile-Horn were on
average 18% larger (w.r.t. the number of rules) than those produced by
CompileHorn-Restricted. Many of the programs obtained by KAON2 contained equality,
and hence could not be rewritten by means of resolution (see Section 4). Hence,
we additionally considered simpli ed versions of the 103 programs where we
removed all rules containing equality. Out of these, 33 turned out to be WL, and 36
were markable; as expected the e ect of equality on linearity-based approaches
is rather minor. In contrast, resolution-based approaches were signi cantly more
e ective than before: Compile-Horn succeeded in 39 cases, and
Compile-HornRestricted in 41 cases. The intersection between the programs rewritable using
markability and resolution turned out to be quite large: in the general case,
there were 16 programs that could be rewritten by only one approach, and in
the equality-free case only 5. Still, taken together, the two approaches succeeded
to rewrite 39 programs (38%) in the general case and 41 programs (40%) in the
equality-free case. Moreover, on average, 73% of the predicates were datalog, and
so could be queried using a datalog engine even if the disjunctive program was
not rewritable. Finally, we found that 20 out of the 103 ontologies were RLt,
out of which 17 were markable. Of the remaining three, two could be rewritten
via resolution.
      </p>
      <p>
        We have also tested scalability of query answering using datalog programs
obtained by our approach. For this, we used UOBM [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and DBpedia, which
come with large datasets. We considered the RLt subset of UOBM, which is
equivalent to a markable program but is not datalog rewritable using
CompileHorn-Restricted, and generated datasets for 1 to 10 universities. DBpedia is a
realistic ontology with a large dataset from Wikipedia. Since DBpedia is Horn,
we extended it with reasonable disjunctive axioms. We used RDFox as a datalog
4 We doctored the ontologies to remove constructs outside SHIQ.
engine. Performance was measured against HermiT [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and Pellet [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. We used
a server with two Intel Xeon E5-2643 processors and 128GB RAM. Timeouts
were 10min for one query and 30min for all queries; a limit of 100GB was
allocated to each task. We ran RDFox on 16 threads. Systems were compared on
individual queries (one for each predicate in the ontology) and on
precomputing answers to all queries. All systems succeeded to answer all queries for U01:
HermiT required 890s, Pellet 505s, and we 52s. Table 2 depicts average times
for datalog and disjunctive predicates, and number of queries on which a system
failed.5 Pellet only succeeded to answer queries on U01. HermiT's performance
was similar for datalog and disjunctive predicates. In our case, queries over the
130 datalog predicates in UOBM (88% of the 148 predicates in UOBM) were
answered instantaneously (&lt;1s); queries over disjunctive predicates were
significantly harder. Finally, due to its size, DBpedia's dataset cannot even be loaded
by HermiT or Pellet. Still, we could compare the rewritings obtained by
marking and Compile-Horn-Restricted. Since Compile-Horn-Restricted cannot compute
rewritings on per query basis, we compared the rewritings for the full
ontologies only. Using RDFox, the rewritings produced by the marking approach and
Compile-Horn-Restricted precomputed the answers for all predicates in 48 and
1.5 seconds, respectively. In this case, the rewriting obtained by
Compile-HornRestricted performs better as it introduces fewer rules. The marking approach
introduces predicates of higher arity, which leads to a larger materialisation.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have proposed enhanced techniques for rewriting disjunctive datalog
programs and DL ontologies into plain datalog programs. Our techniques enable
the use of scalable datalog engines for data reasoning. In this paper, we have
focused on rewritings that preserve fact entailment and hence answers to atomic
queries; we are working on rewriting techniques for full conjunctive queries.
Acknowledgements This work was supported by the Royal Society, the EPSRC
projects Score!, Exoda, and MaSI3, and the FP7 project OPTIQUE.
5 Average times do not re ect queries on which a system failed.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Artale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The DL-Lite family and relations</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>36</volume>
          ,
          <issue>1</issue>
          {
          <fpage>69</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bachmair</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganzinger</surname>
          </string-name>
          , H.:
          <article-title>Resolution theorem proving</article-title>
          .
          <source>In: Handbook of Automated Reasoning</source>
          , pp.
          <volume>19</volume>
          {
          <fpage>99</fpage>
          . Elsevier and MIT Press (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bienvenu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ten Cate</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP</article-title>
          . In: PODS. pp.
          <volume>213</volume>
          {
          <issue>224</issue>
          (
          <year>2013</year>
          ), arXiv:
          <fpage>1301</fpage>
          .
          <fpage>6479</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiryakov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ognyano</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peikov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tashev</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velkov</surname>
          </string-name>
          , R.:
          <article-title>OWLIM: A family of scalable semantic repositories</article-title>
          .
          <source>Semantic Web</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>33</volume>
          {
          <fpage>42</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Stoilos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Computing datalog rewritings beyond Horn ontologies</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <volume>832</volume>
          {
          <issue>838</issue>
          (
          <year>2013</year>
          ), arXiv:
          <fpage>1304</fpage>
          .
          <fpage>1402</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortiz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simkus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          , G.:
          <article-title>Query rewriting for HornSHIQ plus rules</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>726</volume>
          {
          <issue>733</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gardiner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsarkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Framework for an automated comparison of description logic reasoners</article-title>
          .
          <source>In: ISWC</source>
          . pp.
          <volume>654</volume>
          {
          <issue>667</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hustadt</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Reasoning in description logics by a reduction to disjunctive datalog</article-title>
          .
          <source>J. Autom. Reasoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <volume>351</volume>
          {
          <fpage>384</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Su cient conditions for rst-order and datalog rewritability in ELU</article-title>
          . In: DL. pp.
          <volume>271</volume>
          {
          <issue>293</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Datalog rewritability of disjunctive datalog programs and its applications to ontology reasoning</article-title>
          . In: AAAI (
          <year>2014</year>
          ), arXiv:
          <fpage>1404</fpage>
          .
          <fpage>3141</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Datalog rewriting techniques for nonHorn ontologies</article-title>
          .
          <source>Tech. rep.</source>
          , Department of Computer Science, University of Oxford (
          <year>2014</year>
          ), https://krr-nas.cs.ox.ac.uk/2014/DL/rewritings/paper.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Krisnadhi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Data complexity in the EL family of description logics</article-title>
          .
          <source>In: LPAR</source>
          . pp.
          <volume>333</volume>
          {
          <issue>347</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>G.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards a complete OWL ontology benchmark</article-title>
          .
          <source>In: ESWC</source>
          . pp.
          <volume>125</volume>
          {
          <issue>139</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Reasoning in Description Logics using Resolution and Deductive Databases</article-title>
          .
          <source>Ph.D. thesis</source>
          , Univesitat
          <source>Karlsruhe (TH)</source>
          , Karlsruhe, Germany (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olteanu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Parallel materialisation of datalog programs in centralised, main-memory RDF systems</article-title>
          . In: AAAI (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shearer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Hypertableau reasoning for description logics</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>36</volume>
          ,
          <issue>165</issue>
          {
          <fpage>228</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Perez-Urbina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Tractable query answering and rewriting under description logic constraints</article-title>
          .
          <source>J. Appl. Log</source>
          .
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <volume>186</volume>
          {
          <fpage>209</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Pellet: A practical OWL-DL reasoner</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <volume>51</volume>
          {
          <fpage>53</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Trivela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chortaras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
            ,
            <given-names>G.B.</given-names>
          </string-name>
          :
          <article-title>Optimising resolution-based rewriting algorithms for DL ontologies</article-title>
          . In: DL. pp.
          <volume>464</volume>
          {
          <issue>476</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eadon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chong</surname>
            ,
            <given-names>E.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolovski</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Annamalai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivasan</surname>
          </string-name>
          , J.:
          <article-title>Implementing an inference engine for RDFS/OWL constructs and user-de ned rules in Oracle</article-title>
          . In: ICDE. pp.
          <volume>1239</volume>
          {
          <issue>1248</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>