=Paper= {{Paper |id=Vol-2373/paper-30 |storemode=property |title=Practical Datalog Rewriting for Existential Rules |pdfUrl=https://ceur-ws.org/Vol-2373/paper-30.pdf |volume=Vol-2373 |authors=Zhe Wang,Peng Xiao,Kewen Wang |dblpUrl=https://dblp.org/rec/conf/dlog/WangXW19 }} ==Practical Datalog Rewriting for Existential Rules== https://ceur-ws.org/Vol-2373/paper-30.pdf
    Practical Datalog Rewriting for Existential Rules

                        Zhe Wang, Peng Xiao, and Kewen Wang

                                Griffith University, Australia



       Abstract. Existential rules is an expressive ontology formalism for ontology-
       mediated query answering and as a trade-off, query answering is of high com-
       plexity, while several tractable fragments have been identified. Existing sys-
       tems are based on first-order rewriting methods and the resulting queries can
       be too large for DBMS to handle. It is shown that datalog rewriting can re-
       sult in more compact queries but the previous study is mostly theoretical. A
       practical datalog rewriting algorithm is still missing. In this paper, we fill the
       gap by proposing a practical datalog rewriting algorithm for conjunctive query
       answering over existential rules, and establish its correctness over well known
       fragments of existential rules. Experiments on our prototype system showed
       superior or comparable performance over state-of-the-art systems on both the
       compactness of rewritings and the efficiency of query answering.


1    Introduction
Existential rules (a.k.a. Datalog± and tuple generating dependencies) [2, 6] is a family
of expressive ontology languages. It attracted intensive interest lately due to its ex-
pressive power covering datalog and many Horn description logics, including the core
dialects of DL-Lite and EL [5], which underlay the OWL 2 Profiles. This makes ex-
istential rules an appealing formalism for ontology-mediated query answering. While
the query answering problem is undecidable over the full formalism, several interesting
fragments have been proposed [4, 6, 17] that support tractable query answering.
    A major approach for query answering over ontologies expressed in description
logics or existential rules is query rewriting. Given an ontology Σ and a query q, a
rewriting method transforms them into another query qΣ , which is sometimes in a
different query formalism, such that answering qΣ can be handled by conventional
database management systems (DBMSs) and at the same time preserves the an-
swers to the original ontology-mediated query. The rewriting approach is particularly
promising as it allows ontology-mediated query answering to be implemented on top
of existing highly-optimised database query engines. While many algorithms and sys-
tems have been developed for various description logics [18, 7, 25, 23], particularly for
DL-Lite and EL [15, 20, 22, 12], it is challenging to extend them to more general exis-
tential rules, as existential rules allow predicates of arbitrary arities (instead of only
unary and binary predicates) and variable permutations in the rules.
    Existing systems for query answering over existential rules are typically based on
first-order rewritings [8, 13, 14], i.e., qΣ is a first-order query. A limitation of such
an approach is that it can only handle ontologies and queries that are first-order
rewritable. Well-accepted first-order rewritable classes are the linear and sticky exis-
tential rules [6]. Yet many practical ontologies do not necessarily fall into these classes,
2       Zhe Wang, Peng Xiao, and Kewen Wang

such as some ontologies formulated in EL. Even for ontologies and queries that are
first-order rewritable, the results of rewriting can suffer from a significant blow-up
and become difficult for DBMSs to handle [19].
     On the other hand, taking datalog as the target query language can lead to much
more compact rewritings and it is shown for description logics that executing (non-
recursive) datalog rewritings is much more feasible for DBMSs than equivalent first-
order rewritings [12]. All ontologies and queries that are first-order rewritable are
trivially datalog rewritable, and more datalog rewritable classes are known, such as
the guarded existential rules [9]. However, existing research on datalog rewriting of
existential rules are mostly theoretical [10, 3] (refer to [1] for a detailed discussion).
While several algorithms and systems have been developed for datalog rewriting for
various description logics [7, 12], to the best of our knowledge, a practical system is
still missing for datalog rewriting over more general existential rules.
     In this paper, we fill the gap by presenting both a practical algorithm and a
prototype system for datalog rewriting and query answering over existential rules. Our
algorithm is based on the notion of unfolding [24] and the observation that unfolding
can generate a datalog rewriting whenever it exists. Yet instead of naive unfolding,
to achieve compactness of rewriting, we separate the results of unfolding into short
rules by introducing fresh predicates and reusing such predicates when possible. Such a
separation technique not only reduces the length of generated rules but also facilitates
structure sharing (via the reuse of predicates). While such a rewriting process may
not terminate, we move on to identify classes of ontologies where the rewriting process
terminates and produce correct datalog rewritings, including both a novel class and
several existing well-accepted classes. After that, we introduce a practical algorithm
for computing the datalog rewriting through constructing rewriting graphs, which
again facilitates structure sharing and can be seen as a decomposed representation of
rewritings. Finally, we implemented our algorithm as a prototype system and evaluate
it against the state-of-the-art systems for both existential rules and description logics
on commonly used benchmark ontologies and queries.


2   Preliminaries
Let Nc , Nn , and Nv be disjoint, countably infinite sets of constants, (labelled) nulls,
and variables respectively. A term is either a constant, null, or variable. We assume
standard first-order logic notions, such as predicates, (ground) atoms, formulas, en-
tailment (|=) and equivalence (≡). An instance is a (possibly infinite) set of atoms
that contains only constants and nulls. A dataset is a finite ground instance. For a
formula or an instance ϕ, var(ϕ) denotes the variables in ϕ.
    An existential rule (or a rule) r is a formula of the form

                             ∀x.∀y.[∃z.ϕ(x, z) ← ψ(x, y)]

where x, y and z are pairwise disjoint vectors of variables, and ϕ(x, z) and ψ(x, y)
are conjunctions of atoms with variables from respectively x∪z and x∪y. We assume
rules do not contain constants. Variables in x are called frontier variables, denoted
fr(r). and those in z are called existential variables, denoted ext(r). Formula ϕ is the
head of the rule r, denoted head(r), and formula ψ is the body of r, denoted body(r);
                                   Practical Datalog Rewriting for Existential Rules         3

again, they can be seen as (existentially quantified conjunctions of) sets of atoms. For
brevity, universal quantifiers in a rule are often omitted, and we sometimes express
rule r as head(r) ← body(r). A datalog rule r is an existential rule whose head consists
of a single atom and ext(r) is empty.
    A conjunctive query (CQ) q is a first-order formula of the form ∃y.ϕ(x, y), where
x and y are tuples of variables and φ is a conjunction of atoms containing only
constants and variables from x ∪ y. Sometimes it is convenient to treat it as a datalog
rule Q(x) ← ϕ(x, y) where Q is a fresh predicate with arity |x|. If x is empty then q
is a Boolean CQ (BCQ). Answering CQs can be reduced to that of BCQs and hence
w.l.o.g. we consider only BCQs in this paper. For convenience, we assume a fixed
renaming between variables and labelled nulls, which allows us to identify a BCQ
with the set of its atoms where all the variables are renamed as nulls, and vice versa.
A union of BCQs (UCQ) is a disjunction of BCQs, which is seen as a finite set of
finite instances.
    An ontology-mediated query (OMQ) is a pair Q = (Σ, q) with Σ a finite set of
rules and q a BCQ. The answer of Q over a dataset D is true if Σ ∪ D |= q, and false
otherwise. A datalog rewriting of an OMQ (Σ, q) is a of the form (Π, Q) where Π is a
datalog program and Q is a nullary predicate, such that for any dataset D, Σ ∪ D |= q
iff Π ∪ D |= Q. We call it a strong datalog rewriting if additionally Σ ∪ {q} |= Π. Q
is the query predicate of q and sometimes denoted as Qq . When Π consists of only
non-recursive datalog rule with Q in their heads, the rewriting is a UCQ rewriting.
    The UCQ rewriting of OMQs can be obtained through the notion of piece unifica-
tion for a given BCQ q and a rule r [16]. First, for a subset q 0 of q, a variable occurring
both in q 0 and in q \ q 0 is a separating variable for q 0 in q. A piece unifier of q and r is
a triple µ = (B, H, τ ), where ∅ ⊂ B ⊆ q, H ⊆ head(r), and τ is a most general unifier
between B and H such that the following condition is satisfied for each z ∈ ext(r): If
a term t (t 6= z) in B ∪ H is unified with z, i.e., zτ = tτ , then t is not a separating
variable for B in q.
    For a set X of variables and a substitution σ, σ|X denotes the restricted substi-
tution to domain X. Another substitution σ 0 is a safe extension of σ|X on a set of
variables Y , if σ 0 coincides with σ|X and for each y ∈ Y , yσ 0 is a fresh variable.


3    Datalog Rewriting for OMQs

In this section, we will introduce a compact datalog rewriting approach, based on the
notion of unfolding for existential rules [24]. A rule r can be unfolded by a rule r0 if
there exists a piece unifier µ = (B, H, τ ) of body(r) and r0 , and the result of unfolding
r by r0 with µ is the following rule denoted unf µ (r, r0 ):

               ∃z.[head(r)τ 00 ∪ head(r0 )τ 0 ] ← (body(r) \ B)τ ∪ body(r0 )τ 0            (1)

where τ 0 is a safe extension of τ |var(H) on var(r0 ) \ var(H), τ 00 is a safe extension of
τ |fr(r) on ext(r), and z consists of all the variables in the head but not in the body.
The result of exhaustive unfolding for Σ is denoted unfold(Σ).
      Note that if a strong datalog rewriting exists for an OMQ, then there exists a
subset of its unfolding that consists of datalog rules and is such a datalog rewriting.
4           Zhe Wang, Peng Xiao, and Kewen Wang

Lemma 1. For an OMQ (Σ, q), a strong datalog rewriting exists iff there exists a
finite subset Π ⊆ unfold(Σ ∪ {q}) such that (Π, Qq ) is such a datalog rewriting.
    Clearly, a naive method to compute a datalog rewriting using the above unfolding
is impractical, as the datalog rules obtained from unfolding can be very large (indeed,
are often of unbounded sizes). In what follows, we introduce a practical approach for
datalog rewriting by breaking up long datalog rules into smaller ones. It is known that
every OMQ (Σ, q) can be transformed into an OMQ (Σ 0 , q) whose rule heads have
single atoms in a way that preserves query answering [8]. For convenience, in what
follows, we assume Σ consists of rules of single-atom heads. As a first step, we present
an alternative operator, which we simply call rewriting, between two existential rules.
Definition 1. For two rules r, r0 and a piece unifier µ = (B, H, τ ) of body(r) and
r0 , the result of rewriting r by r0 with µ, denoted rewµ (r, r0 ), consists of the following
rules
                            P(xr0 )τ ← body(r0 )τ 0 ,                                          (2)
                            ∃z.R(v) ← (body(r) \ B)τ ∪ {P(xr0 )τ },                            (3)
                                                                       00          0   0
                                    α ← R(v) for each α ∈ head(r)τ ∪ head(r )τ ,               (4)
                     00         0   0
       ∃z.[head(r)τ ∪ head(r )τ ] ← (body(r) \ B)τ ∪ {P(xr0 )τ },                              (5)
where τ , τ are as in Equation (1), v is a tuple of variables in head(r)τ ∪ head(r )τ 0 ,
        0     00                                                              00           0

z consists of all the variables in the head but not in the body, and P and R are fresh
predicates with arities |xr0 | and |v|.
Note that rule (5) can be captured by rules (3) and (4), yet it is included for the
correctness of rewriting in some cases. We mark rule (5) as temporary which will be
deleted after the rewriting is completed, and we will discuss in next section when the
generation of such temporary rules are not necessary. Also, for rules of the form (3)
that are not datalog rules, we also mark them for deletion.
    We can use rewµ (r, r0 ) to replace unf µ (r, r0 ) in the unfolding of rules and it is not
hard to see that the resulting set of rules are equivalent w.r.t. query answering. Indeed,
if we allow fresh predicates P and R to be further unfolded, then the rewriting process
generates strictly more rules than unfold(Σ). However, it is clear that allowing the
unfolding of fresh predicates forfeits their purpose, as they were introduced to break
up long rules and their unfolding simply reverses it. Hence, it is natural to disallow
unfolding of such fresh predicates.
    Furthermore, it also does not make sense to introduce a different fresh predicate
each time when the “same” piece unifier, i.e., up to variable renaming, is applied.
This is achieved through a label function λ(·) that maps each fresh predicate to a set
of atoms. Formally, for predicates P and R in Definition 1, λ(P) = Hτ = head(r0 )τ 0
and λ(R) = γ(head(r))τ 00 ∪ γ(head(r0 ))τ 0 , where γ(A(x)) = A(x) if A is a (non-fresh)
predicate occurring in the initial rules and otherwise γ(A(x)) = λ(A).
    Finally, special handling for rewriting a query q by another rule r0 is needed.
In particular, the handling of head atoms is simpler, such that rewµ (q, r0 ) replaces
rules (3)–(5) with the following rules
                            Q ← (body(q) \ B)τ ∪ {P(xr0 )τ },                                  (6)
                            Q ← (body(q) \ B)τ ∪ body(r0 )τ 0 .                                (7)
                                 Practical Datalog Rewriting for Existential Rules          5

Again, rule (7) can be captured by rules (2) and (6), yet it is included for the cor-
rectness of rewriting in some cases. It is also temporary rules to be deleted after the
rewriting is completed.
    We are ready to define the rewriting of a set of existential rules Σ.
Definition 2. The rewriting of a set of existential rules Σ, denoted rewrite(Σ) is
the (possibly infinite) set of rules obtained by exhaustively applying rewriting between
each pair of rules r, r0 as in Definition 1 under the following conditions and by deleting
temporary and non-datalog rules at the end:

 – H does not contain any fresh predicate;
 – If r is a query (i.e., has Q in the head), its rewriting follows (2), (6) and (7);
 – For each introduced fresh predicate A ∈ {P, R} in an atom of the form A(x), if
   there is an atom A0 (x0 ) with A0 another fresh predicate of the same type introduced
   in previous rewriting steps such that there is a most general unifier σ between λ(A)
   and λ(A0 ) with xσ = x0 σ, then reuse A0 to replace A.
 – Eliminate rule r if there is another rule r0 and a homomorphism σ from body(r0 )
   to body(r) such that a safe extension σ 0 of σ exists with head(r0 )σ 0 = head(r).

   We use rewrite|R (Σ) (resp., rewriteR (Σ)) with R ⊆ {(2), . . . , (7)} to denote the
variant of rewrite(Σ) where temporary rules in R are not generated (resp., only rules
in R are generated) during rewriting.

Example 1. Consider Σ1 = {r1 : B(y) ← A(x, y), r2 : ∃y.A(x, y) ← B(x)} and q =
Q ← A(x, y) ∧ A(y, z). The (one step) rewriting of r1 by r2 results the following rules:

  r3 : P1 (x) ← B(x),      r4 : ∃y.R1 (x, y) ← P1 (x),          r5 : A(x, y) ← R1 (x, y),
  r6 : B(y) ← R1 (x, y),   r7 : ∃y.[A(x, y) ∧ B(y)] ← P1 (x),

where λ(P1 ) = {A(x, y)} and λ(R1 ) = {A(x, y), B(y)}.
   Rewriting q by r2 , by r1 , and then by r2 leads to (not a complete list of) rules:

 r8 : Q ← A(x, y) ∧ P1 (y),   r9 : Q ← A(x, y) ∧ B(y),
 r10 : P2 (y) ← A(z, y),      r11 : Q ← A(x, y) ∧ P2 (y),   r12 : Q ← A(x, y) ∧ A(z, y),
 r13 : Q ← P1 (x),            r14 : Q ← B(x).

Note that P1 is reused; also, rules r13 and r14 cannot be obtained without keeping
temporary rules r9 and r12 during the rewriting.

   Now we establish the soundness and completeness of our datalog rewriting.
Proposition 1. For an OMQ Q = (Σ, q), let Π = rewrite(Σ ∪ {q}) (resp., Π =
rewrite|(5) (Σ ∪ {q}) or Π = rewrite|(7) (Σ ∪ {q})). If Π is finite then (Π, Qq ) is a
datalog rewriting of Q.

Proof (Sketch). We want to show for each dataset D, Σ ∪D |= q iff Π ∪D |= Qq . First,
we only need to establish “if” direction for Π = rewrite(Σ ∪{q}), which can be checked
inductively through the unfolding of rules in Π. In particular, consider the unfolding
of (reused) fresh predicates in two rules r1 and r2 of the form Q ← B1 ∪ {A(x)} and
6           Zhe Wang, Peng Xiao, and Kewen Wang

A(x) ← B2 , where A is a fresh predicate and B1 , B2 do not contain fresh predicates.
Suppose λ(A) = H(y, z) with some substitution σ such that yσ = x, then by the
definition of rewriting and unfolding, there are rules (up to variable renaming) Q ←
B1 ∪ H 0 σ with H 0 ⊆ H and Hσ ← B2 from unfold(Σ ∪ {q}). Note that the predicate
reuse condition in Definition 2 ensures that H 0 σ is a piece for unification. Thus, the
result of unfolding r1 by r2 is in unfold(Σ ∪ {q}). This can be extended to the cases
where B1 , B2 contain fresh predicates.
    Also, we only need to show the “only if” direction for Π = rewrite|(5) (Σ ∪ {q})
and Π = rewrite|(7) (Σ ∪ {q}), and we only need to show for each rule r ∈ unfold(Σ ∪
{q}) with head Q, r also occurs in unfold(Π) (up to variable renaming). We only
demonstrate the proof for Π = rewrite|(7) (Σ ∪ {q}). We show this by an induction on
forward chaining, i.e., consider an unfolding chain: (i) r1 , . . . , rn ∈ Σ ∪ {q}, (ii) for
i ≥ 2, ri,...,1 is the result of unfolding ri by ri−1,...,1 , and (iii) rn = q. For the first two
rules r1 , r2 in the chain, r2,1 is of the form (1), and it has corresponding results of
                                           (5)      (2)                                        (5)
rewriting, rules (5) and (2), denoted r2,1 and r2,1 . The head of r2,1 is preserved in r2,1
for further unfolding and the body of r2,1 can be reconstructed through (unfolding)
 (5)        (2)                                                (5)
r2,1 and r2,1 . For each rule ri (i ≥ 2), there is a rule ri,...,1 in the process of rewriting
                       (5)         (2)                                                       (5)
that rewrites it to ri,...,1 and ri,...,1 , such that the head of ri,...,1 is preserved in ri,...,1
for further unfolding and the body of ri,...,1 can be reconstructed through (unfolding)
 (5)          (2)                                                                           (5)
ri,...,1 and ri,...,1 . Finally, the body of rn,...,1 can be reconstructed by unfolding rn,...,1
      (2)
and ri,...,1 for all 1 ≤ i ≤ n in Π. That is, r = rn,...,1 is in unfold(Π).


4    Datalog Rewritable Classes
The process of rewriting introduced in the previous section does not necessarily ter-
minate; for example, it does not terminate on Σ2 = {∃z.A(z, x) ← A(x, y)}. It is not
hard to see that the termination issue is caused by the generation of temporary rules.
If we disallow the generation of temporary rules, the rewriting always terminates.
Lemma 2. For a set of rules Σ, the computation of rewrite|(5),(7) (Σ) always termi-
nates and the result is finite.
This can be seen as follows. Suppose the maximum number of atoms in the initial
rule bodies is m. For each rule of the forms (2)–(4) and (6), it has a single atom head
and its body has at most m atoms. Also, for each fresh predicate P, λ(P) consists of a
single head atom of an initial rule (up to variable renaming). Similarly, for each fresh
predicate R, λ(R) consists of at most m head atoms of initial rules, as each rewriting
step may add one atom to the head and rule (3) can be rewritten at most m times
(noting that unfolding of fresh predicates is disallowed).
    In what follows, we discuss several classes of rules on which (finite) datalog rewrit-
ings can be obtained by restricting the generation temporary rules. We start by defin-
ing a novel class called separable rule sets, which is through adapting a marking
procedure from [17].
    We use zr to denote an existential variable z in rule r. For a set of rules Σ, an atom
α and a variable x occurring at position i in α, we mark the position with a minimum
set of variables M (i, α) satisfying the following conditions: (i) if head(r) = α for some
                                  Practical Datalog Rewriting for Existential Rules       7

r ∈ Σ with x ∈ ext(r) then M (i, α) = {xr }; (ii) if head(r) = α for some r ∈ Σ with
x ∈ fr(r) then M (i, α) is the intersection of all M (j, β) s.t. β ∈ body(r) and x occurs
at position j in atom β; (iii) if α ∈ body(r) then M (i, α) is the union of M (i, head(r0 ))
for all r0 ∈ Σ s.t. r can be unfolded by r0 with α unified with head(r0 ).
Definition 3. A set of rules Σ is separable if for each rule r ∈ Σ, there do not exists
a variable x shared by two body atoms such that the intersection of all M (i, α), for
each atom α ∈ body(r) and each position i that x occurs in α, is non-empty.
It is not hard to see that Σ2 is separable.
    The class of shy rule sets is proposed in [17]. Intuitively, a set of rules is shy if no
existential variable occurs in a position of a relation where a join variable occurs in a
rule body, neither can two variables occurring in such (existential-variable) positions
of the body also occur in the same head atom.
Proposition 2. Each rule set that is shy is also separable.
This can be seen from that any rule set violating separability also violates Condition 1
of shyness [17]. However, the converse of the proposition is not necessarily true and
Σ2 ∪ {B(x, y) ← A(x, u) ∧ A(y, v)} is such a case.
Proposition 3. For a OMQ (Σ, q) such that Σ ∪ {q} is separable, rewrite(2),(6) (Σ ∪
{q}) is a datalog rewriting of (Σ, q).
Intuitively, the proof is based on the fact that the unfolding of rules in Σ ∪ {q} only
involves a single atom in the body each time, and thus the result of unfolding can be
reconstructed from the result of rewriting even if the bodies are separated.
    The class of sticky rule sets is introduced in [6], which are shown to be first-order
rewritable. The key idea is that fresh variable generated during backward chaining
cannot occur in two separate atoms [21]. The two classes of sticky and separable rules
are incomparable; for instance, Σ3 = {A(x, y) ← A(x, z) ∧ A(z, y)} is separable but
not sticky, and Σ1 ∪ {C(x, y, z) ← A(x, y) ∧ A(y, z)} (where Σ1 is from Example 1) is
sticky but not separable.
Proposition 4. For a OMQ (Σ, q) such that Σ is sticky, rewrite(2),(6),(7) (Σ ∪ {q}) is
a datalog rewriting of (Σ, q).
Indeed, the proposition holds for all finite unification sets [2], as the backward chaining
always terminates on such rule sets and hence rewrite(2),(6),(7) (Σ ∪ {q}) (in particular,
rules of the form (7)) is finite. Moreover, if we disallow the reuse of fresh predicates,
the resulting datalog rewriting is a non-recursive one.
    Finally, we are also interested in the class of acyclic rule sets, for which various
notions of acyclicity have been proposed [11], which guarantees the termination of
forward chaining (i.e., on certain chase procedures).
Proposition 5. For a OMQ (Σ, q) such that Σ is acyclic, rewrite|(7) (Σ ∪ {q}) is a
datalog rewriting of (Σ, q).
It can be seen from that the acyclicity conditions guarantee the Skolem chase ter-
minates on critical instances [11], which avoids the generation of infinitely many
Skolemised terms. In our case, it avoids the generation of infinitely many existen-
tial variables in the heads of rules (5), whereas the sizes of their bodies are always
bounded. Hence, the number of such rules is finite, and rewrite|(7) (Σ ∪ {q}) is finite.
8       Zhe Wang, Peng Xiao, and Kewen Wang

5   A Rewriting Algorithm
In this section, we introduce a practical algorithm for computing datalog rewritings
of the form rewrite|(5) (Σ), which can be configured to compute rewrite|(5),(7) (Σ) and
rewrite(2),(6),(7) (Σ). It can also be easily adapted to compute rewrite|(7) (Σ). Inspired
by [12], we compute a decomposed representation of the heads and bodies of the result-
ing rules from unfolding, such that (i) the representation is compact due to structure
sharing, (ii) the datalog rewriting rewrite|(5) (Σ) can be conveniently extracted from
such a representation, and (iii) the complete rules from unfolding (e.g., rule (7)) can
be reconstructed via backtracking.
    For a set of rules Σ whose maximum number of body atoms is m, we define a
rewriting graph to be a direct graph (N, E) whose nodes in N are of the form (R, I),
where R and I each consists of at most m atoms, and whose edges in E are labelled
with piece unifiers. Each node n = (R, I) is associated with two fresh predicates Pn
and Rn , and we reuse fresh predicates as in Section 3 through a label function λ(·).
For each node n = (R, I), let un = fr(R ← I) and v n = var(R). Intuitively, an edge
from node n = (R, I) to node n0 = (R0 , I 0 ) labelled with µ = (B, H, τ ) represents a
set of rules (i.e., rules (2)–(4) and (6)) obtained during rewriting as follows:

         Pn0 (un0 ) ← I 0 ,                           corresponding to (2)
     ∃z.Rn0 (v n0 ) ← (I \ B)τ ∪ {Pn0 (un0 )},        corresponding to (3) and (6)
                 α ← Rn0 (v n0 ) for each α ∈ R0 ,    corresponding to (4)

where z consists of all the variables in the head but not in the body. For a rewriting
graph G, dlg(G) is the set of datalog rules obtained as above from the edges of G.
    Now, we present the algorithm for computing rewriting graphs. Note that similar
as the rule elimination in Definition 2, for a new node n = (R, I), if there is an existing
node n0 = (R0 , I 0 ) and a homomorphism σ from I 0 to I such that a safe extension σ 0
of σ exists with R ⊆ R0 σ 0 , then n should be eliminated. In Algorithm 1, actions with
a ∗ are subject to the reuse of fresh predicates and node elimination. Also, τ 0 and τ 00
denote safe extensions of τ as in Equation (1).
    In Algorithm 1, we first initialise the nodes to be the pairs of heads and bodies
of existing rules (line 2) and use a queue Γ to store the nodes representing rules
to be rewritten (line 4 onwards). Since a piece unifier cannot contain a fresh pred-
icate (by Definition 2 and guaranteed in line 7), only original rules or rules of the
forms (2), (3), (6) and (7) need to be rewritten, and only original rules or rules of
the form (4) with head α can rewrite such rules. Lines 9–21 are the rewriting process,
with lines 10–12 corresponding to the rewriting of an original rule or a rule (2), and
lines 13–15 corresponding to that of rules (3) and (6). The rewriting produces 2 new
node (subject to predicate reuse and node elimination): node n00 represents the gener-
ated rules (2)–(4) and (6), and is added to N . It is added to the queue for the further
rewriting of rule (2); node n† is not added to N but is added to the queue for the
further rewriting of rules (3) and (6). Note that temporary rules of the form (7) need
to be generated to ensure the correctness. The generation of their representations is
achieved through backtracking (lines 22–27). It is not hard to verify that rule (7)
corresponds to ∃z.Rn (v n ) ← S.
    We establish the correctness of Algorithm 1 as follows.
                                     Practical Datalog Rewriting for Existential Rules                  9


    Algorithm 1: Compute Rewriting Graphs
   input : A set of existential rules Σ
   output: A rewriting graph (N, E)
 1 begin
 2    initialise N := {(head(r), body(r)) | r ∈ Σ} and E := ∅;
 3    assign∗ fresh predicates Pn , Rn to each n = (R, I) ∈ N with λ(Pn ) = λ(Rn ) = R;
 4    let Γ be a queue and initially Γ := N ;
 5    while Γ 6= ∅ do
 6         take n = (R, I) popped from Γ ;
 7         foreach n0 = (R0 , I 0 ) ∈ N and α ∈ R0 containing no fresh predicate do
 8             consider rule r := α ← I 0 ;
 9             foreach piece unifier µ = (B, H, τ ) of I and r do
10                if n ∈ N then
11                    take n00 := (R00 , I 0 τ 0 ) where R00 = Rτ 00 ∪ {ατ 0 } if R is a singleton;
                        otherwise R00 = {Pn (un )τ 00 , ατ 0 };
12                    assign∗ a fresh predicate Rn00 and let λ(Rn00 ) := λ(Pn )τ 00 ∪ {ατ 0 };
13                  else
14                      take n00 := (R00 , I 0 τ 0 ) where R00 = Rτ 00 ∪ {ατ 0 } if R is a singleton;
                         otherwise R00 = {Rn (v n )τ 00 , ατ 0 };
15                      assign∗ a fresh predicate Rn00 and let λ(Rn00 ) := λ(Rn )τ 00 ∪ {ατ 0 };
16                  assign∗ a fresh predicate Pn00 and let λ(Pn00 ) := ατ 0 ;
17                  add∗ n00 to N and (n, n00 ) to E labelled with µ; push∗ n00 to Γ ;
18                  take n† := (R00 , (I \ B)τ ∪ {Pn00 (un00 )});
19                  assign∗ a fresh predicate Rn† and let λ(Rn† ) = λ(Rn )τ 00 ∪ {ατ 0 };
                     push∗ n† to Γ ;
20          initialise p := n and S := I;
21          while p has a parent n0 = (R0 , I 0 ) and n0 6= n do
22               suppose the path from n0 to n has labels (B1 , H1 , τ1 ), . . . , (Bm , Hm , τm );
23               let p := n0 and S := S ∪ (I 0 \ B1 )τ1 · · · τm ;
24               take n‡ := (R, S) and reuse the predicate Rn‡ := Rn ; push∗ n‡ to Γ ;

25       return (N, E);




Theorem 1. For a set of rules Σ, if rewrite|(5) (Σ) is finite then Algorithm 1 computes
a finite rewriting graph G such that dlg(G) ≡ rewrite|(5) (Σ).


6      Experiments

We have implemented a prototype system Drewer (Datalog REWriting for Existential
Rules), where the piece unification implementation is adapted from the first-order
rewriting system Graal [13], and used RDFox1 as the datalog engine. We conducted
two set of experiments to evaluate the performance of our system. All experiments
were performed on a desktop machine with a processor at 3.30 GHz and 8GB of RAM.
1
     https://www.cs.ox.ac.uk/isg/tools/RDFox/
10       Zhe Wang, Peng Xiao, and Kewen Wang

    In the first set of experiments, we compared our system with state-of-the-art query
rewriting systems regarding the compactness and efficiency of query rewriting. In par-
ticular, Graal is a first-order rewriting system for existential rules, Iqaros [23] is a
first-order rewriting system for OWL 2 DL featured in its rewriting minimisation,
and Rapid [22] is a datalog rewriting system for DL-Lite and EL based on optimised
resolution. For benchmark ontologies and queries, we selected 3 commonly used on-
tologies [13], which are DL-Lite versions of Adolena (A), OpenGALEN2 (G), and
StockExchange (S). Each of the ontologies comes with 5 conjunctive queries.
    Table 1 records the sizes and times for query rewriting, where sizes are measured
by the numbers of atoms and the times are in milliseconds.


                               Rewriting Size                  Rewriting Time
     Ontology    Query
                            UCQ Drewer Graal         Drewer    Graal Rapid Iqaros
                   q1         27       54      2          26      15      9       3
                   q2         50       32      2           3       3      5       3
        A          q3        104       32      1           3       8      6      18
                   q4        224       35      2           2       3     15      10
                   q5        624       37      1           3       2     20     312
                   q1          2        3      1          62       6      5       5
                   q2       1152     1276      1          70      50     41    4040
        G          q3        488       80      5          99      45     20    5779
                   q4        147      155      1          60       3      2     402
                   q5        324       59     19          68      34      9    7206
                   q1          6       10      1          11       7      6       0
                   q2          2       30      1           1       0      2       3
        S          q3          4       15      1           3       1      2      11
                   q4          4       30      1           0       1      2       8
                   q5          8       16      1           1       5      3     117
                        Table 1: Comparison on query rewriting.

     Both Rapid and Iqaros output minimal rewritings of the same sizes recorded un-
der “UCQ”. To achieve efficiency as well as compactness in rewriting, Graal makes
use of the so called rule compilation, which leads to its small rewriting output. Yet,
the results of rewriting produced by Graal are not standard UCQs, which have to
be evaluated through a special homomorphism mechanism, and thus cannot be di-
rectly handled by DBMSs for datalog engines. We can see that the benefit of datalog
rewriting w.r.t. sizes compared to UCQ rewritings becomes obvious when large UCQ
rewritings are generated (≥50), with two exceptions q2 and q4 on G. This is because
the ontology has a large number of simple axioms. For rewriting times, the perfor-
mance of our system is comparable to Graal and Rapid, with an exception on G due
to the reason discussed before.
     In the second set of experiments, we compared our system with existing end-to-
end in-memory (as our system is in-memory) query answering systems regarding time
efficiency. We use RDF4j as the data store of Graal. Another system we compared is
Pagoda [25], which is not a standard rewriting system, but a hybrid system combining
a datalog engine and an OWL 2 reasoner. To compare with Graal and Iqaros, we
used the DL-Lite versions of two widely used ontologies, LUBM and Reactome [25].
                                   Practical Datalog Rewriting for Existential Rules         11

As Reactome comes with over 100 queries, we randomly selected 5 queries. Table 2
records the times (in seconds) on pre-processing (Pre) and query evaluation (Eval).


                          Drewer             Graal             Pagoda           Iqaros
    Ontology Query
                         Pre Eval           Pre Eval           Pre Eval        Pre Eval
                q1             1.49                4.73               0.82            0.00
                q2             0.46               13.87               0.00            0.00
     LUBM       q3     22.94   0.03      200.74    0.00     121.98    0.95     22.3   0.01
                q4             1.48                0.00               0.39            0.02
                q5             1.49                0.18               0.65            0.03
                q1             0.04                5.15               0.03            0.01
                q2             0.07                5.35               0.02            0.00
    Reactome    q3      7.83   0.08       59.80    5.03      13.73    0.01     8.46   0.00
                q4             0.07                5.05               0.01            0.01
                q5             0.07                5.12               0.02            0.00
                        Table 2: Comparison on query answering.
    Our system showed superior performance in most cases compared to Graal, which
is possibly due to the special homomorphism mechanism implemented in it. both
of which took significantly more time in pre-processing. Considering only the query
evaluation times, Drewer is comparable to Pagoda, whereas Pagoda took more pre-
processing time for the materialization of upper-bound and lower-bound datalog pro-
grams. Iqaros shows best performance among all the systems, while the performance
of Drewer is comparable on Reactome. Note that Iqaros is specialised for first-order
rewritable OWL 2 DL ontologies, whereas Drewer is for more general existential rules.


7    Conclusion

In this paper, we have presented a novel approach to datalog rewriting for existential
rules and introduced a new datalog rewritable class, the separable rule sets, which gen-
eralises the class of shy rule sets. Furthermore, we presented a practical algorithm for
computing datalog rewritings and established its correctness. Finally, we implemented
and evaluated our prototype system Drewer, which is to the best of our knowledge,
the first datalog rewriting system for existential rules. Our system demonstrated su-
perior or comparable performance compared to state-of-the-art rewriting and query
answering systems.
    For future work, we are working on identifying a more general class of datalog
rewritable existential rules, optimising our rewriting algorithm and implementation,
and conducting further evaluation on more complex ontologies and queries.


References
 1. S. Ahmetaj, M. Ortiz, and M. Simkus. Rewriting guarded existential rules into small
    datalog programs. In Proc. of ICDT-18, pp. 4:1–4:24, 2018.
 2. J.-F. Baget, M. Leclère, M.-L. Mugnier, and E. Salvat. On rules with existential variables:
    Walking the decidability line. Artif. Intell., 175(9-10):1620–1654, 2011.
12       Zhe Wang, Peng Xiao, and Kewen Wang

 3. M. Bienvenu, B. ten Cate, C. Lutz, and F. Wolter. Ontology-based data access: a study
    through disjunctive datalog, csp, and MMSNP. In Proc. of PODS-13, pp. 213–224, 2013.
 4. A. Calı̀, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under
    expressive relational constraints. J. Artif. Intell. Res., 48:115–174, 2013.
 5. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general datalog-based framework for tractable
    query answering over ontologies. J. Web Sem., 14:57–83, 2012.
 6. A. Calı̀, G. Gottlob, and A. Pieris. Towards more expressive ontology languages: The
    query answering problem. Artif. Intell., 193:87–128, 2012.
 7. T. Eiter, M. Ortiz, M. Simkus, T.-K. Tran, and G. Xiao. Query rewriting for horn-shiq
    plus rules. In Proc. of AAAI-12, 2012.
 8. G. Gottlob, G. Orsi, and A. Pieris. Query rewriting and optimization for ontological
    databases. ACM Trans. Database Syst., 39(3):25:1–25:46, 2014.
 9. G. Gottlob, S. Rudolph, and M. Simkus. Expressiveness of guarded existential rule
    languages. In Proc. of PODS-14, pp. 27–38, 2014.
10. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive
    datalog programs. In Proc. of KR-12, 2012.
11. B. C. Grau, I. Horrocks, M. Krötzsch, C. Kupke, D. Magka, B. Motik, and Z. Wang.
    Acyclicity notions for existential rules and their application to query answering in on-
    tologies. J. Artif. Intell. Res., 47:741–808, 2013.
12. P. Hansen, C. Lutz, I. Seylan, and F. Wolter. Efficient query rewriting in the description
    logic EL and beyond. In Proc. of IJCAI-15, pp. 3034–3040, 2015.
13. M. König, M. Leclère, and M.-L. Mugnier. Query rewriting for existential rules with
    compiled preorder. In Proc. of IJCAI-15, pp. 3106–3112, 2015.
14. M. König, M. Leclère, M.-L. Mugnier, and M. Thomazo. Sound, complete and minimal
    ucq-rewriting for existential rules. Semantic Web, 6(5):451–475, 2015.
15. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The combined
    approach to query answering in DL-Lite. In Proc. of KR-10, 2010.
16. M. Leclère, M.-L. Mugnier, and F. Ulliana. On bounded positive existential rules. In
    Proc. of DL-16, 2016.
17. N. Leone, M. Manna, G. Terracina, and P. Veltri. Efficiently computable datalog∃
    programs. In Proc. of KR-12, 2012.
18. H. Pérez-Urbina, B. Motik, and I. Horrocks. Tractable query answering and rewriting
    under description logic constraints. J. Applied Logic, 8(2):186–209, 2010.
19. R. Rosati and A. Almatelli. Improving query answering over dl-lite ontologies. In Proc.
    of KR-10, 2010.
20. G. Stefanoni, B. Motik, and I. Horrocks. Introducing nominals to the combined query
    answering approaches for EL. In Proc. of AAAI-13, 2013.
21. M. Thomazo. Conjunctive Query Answering Under Existential Rules - Decidability,
    Complexity, and Algorithms. PhD thesis, Montpellier 2 University, France, 2013.
22. D. Trivela, G. Stoilos, A. Chortaras, and G. B. Stamou. Optimising resolution-based
    rewriting algorithms for OWL ontologies. J. Web Semant., 33:30–49, 2015.
23. T. Venetis, G. Stoilos, and V. Vassalos. Rewriting minimisations for efficient ontology-
    based query answering. In Proc. of ICTAI-16, pp. 1095–1102, 2016.
24. Z. Wang, K. Wang, and X. Zhang. Forgetting and unfolding for existential rules. In
    Proc. of AAAI-18, pp. 2013–2020, 2018.
25. Y. Zhou, B. C. Grau, Y. Nenov, M. Kaminski, and I. Horrocks. Pagoda: Pay-as-you-go
    ontology query answering using a datalog reasoner. J. Artif. Intell. Res., 54:309–367,
    2015.