<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Datalog Rewriting for Guarded TGDs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Benedikt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maxime Buron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Germano</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Kappelmann</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boris Motik</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIRMM, Inria, Univ. of Montpellier</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oxford University</institution>
          ,
          <addr-line>Parks Road, Oxford, OX1 3QD</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technical University of Munich</institution>
          ,
          <addr-line>Boltzmannstraße 3, Garching, 85748</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>104</fpage>
      <lpage>113</lpage>
      <abstract>
        <p>We deal with the problem of fact entailment with respect to a database and a set of integrity constraints, focusing on the case of Guarded tuple-generating dependencies (GTGDs). The original approach to the problem in the literature is via forward reasoning or “chasing”, where one completes the input database by adding fresh elements and facts. This completion process may be infinite, but in the case of GTGDs it is known that one can compute a point where the chase can be cut of without missing any base facts. Another approach is by forming an automaton and checking it for emptiness. Neither of these approaches scales to large input datasets. An alternative approach is to rewrite the constraints into Datalog: the Datalog rewriting can be generated in advance of any dataset and will produce the same base facts as the original constraints. It is known that Datalog rewritings always exist. But to our knowledge the approach has never been implemented. In this work we overview efective algorithms to Datalog rewriting of GTGDs. This presents work that will appear in VLDB 2022.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Chase</kwd>
        <kwd>Datalog</kwd>
        <kwd>Fact Entailment</kwd>
        <kwd>Guarded TGDs</kwd>
        <kwd>Resolution</kwd>
        <kwd>Rewriting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Reasoning with dependencies has played a large role in database theory. Dependencies can
be used to describe semantic restrictions on sources, mapping rules between datasources and
virtual data objects in data integration, and semantic rules on virtual data that allow new
data items to be derived. A fundamental computation problem associated with dependencies
is query answering: given a query , as an existentially quantified conjunction of atoms (a
conjunctive query), a collection of facts I, and a set of dependencies Σ , find all the answers
to  that can be derived from I via reasoning with Σ . For general classes of dependencies,
such as tuple-generating dependencies (TGDs) and equality-generating dependencies, query
answering is undecidable. Thus, much early work focused on dependencies for which query
answering is decidable, with two families being those with terminating chase: 1) those for
which forward-reasoning by inferring all new facts from I will terminate in a finite number of
steps – weakly-acyclic dependencies are perhaps the best-known class with terminating class –
and 2) those for which backwards reasoning terminates – these are often known as first-order
rewritable classes. Another attractive class of dependencies that emerged in the last decade
are guarded tuple-generating dependencies (GTGDs). Guarded TGDs allow to express simple
referential constraints on source instances, common mapping rules between sources and targets,
and target constraints in common description logic and ontology languages. Query-answering
for GTGDs has long been known to be decidable [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. They are also interesting and challenging
in that they do not in general have a terminating chase and are not first-order rewritable.
      </p>
      <p>
        There are several ways to derive the decidability of GTGD query-answering. One can argue
that GTGDs have the tree-like model property – it sufices to look at structures that can be
coded by trees – and then argue via reduction to Monadic Second Order Logic over trees, which
was shown decidable by Rabin [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A refinement is to argue for the tree-like model property,
directly generate an automaton that accepts codes of tree-like models, and test non-emptiness of
the automaton [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Another variant is to create a Tableau calculus that tries to build a tree-like
model, using a form of blocking to ensure termination. This approach is widely-applied in the
setting of description logics [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and was later lifted to the setting of guarded logics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
ifrst approach is infeasible in practice. The second sufers from two problems: building the
automaton is extremely expensive and the non-emptiness test is both complex and expensive.
The third approach is more plausible, but it requires a huge amount of non-determinism in
choosing what tableau rules to fire in addition to the complexity of the blocking condition.
      </p>
      <p>
        An alternative approach is based on Datalog rewriting: starting from Σ alone, we create a
Datalog program rew(Σ) which produces the same base facts as Σ for any dataset I. Datalog
rewriting has its origins in work of Marnette [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but has been extended to wider classes of
constraints [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and refined to get more information on the program [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Since Σ and rew(Σ)
entail the same base facts on I, we can answer any existential-free conjunctive query (i.e. queries
where all variables are answer variables). The restriction to existential-free queries is technical:
existentially quantified variables in a query can be matched to objects introduced by existential
quantification, and these are not preserved in a Datalog rewriting. However, practical queries
are typically existential-free since all query variables are usually answer variables. The rewriting
approach has the advantage of being scalable in the data. Moreover, rew(Σ) can be run using
optimized Datalog engines [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. While the approach has been implemented in the setting of
description logics [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], concrete algorithms in the setting of GTGDs have not yet appeared.
      </p>
      <p>
        The goal of this work is to give an account of Datalog rewriting for GTGDs. We will explain
the relationship to the tree-like model approach and provide an abstract framework, giving
suficient conditions that guarantee completeness of a Datalog rewriting algorithm. After
providing an initial algorithm that instantiates the framework, we show how the algorithm
can be optimized. We implemented the algorithms and performed an experimental evaluation,
showing that they are competitive and useful in practice. The GitHub repository [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] contains
the associated codebase and experiments.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Decidability of query answering for GTGDs was shown in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], based on cutting of the chase.
This technique has been extended to wider classes (e.g. frontier-guarded TGDs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]).
      </p>
      <p>
        Answering queries via rewriting originated within description logics. The original emphasis
was on queries and constraints where there is a rewriting as a union of conjunctive queries
(UCQs), as in the DL-Lite family [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Datalog rewriting was extensively explored in the context
of ontologies in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], with an implementation in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In the context of TGDs, rewritings were
ifrst considered in the context of inclusion and key dependencies [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], where again one can
obtain a UCQ rewriting. Datalog-rewritability for certain answers of conjunctive queries with
respect to GTGDs was shown first in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This was extended to frontier-guarded TGDs in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
and nearly frontier-guarded and nearly guarded TGDs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. While Datalog rewriting has been
implemented for separable and weakly separable TGDs [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], it has not been implemented for
GTGDs to our knowledge.
      </p>
      <p>
        Our work relies on connections between query rewriting and specialized versions of the
chase, dating back to work on answering queries over data sources with access patterns [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. It
was later explored for GTGDs and disjunctive GTGDs in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This paper presents work that is
to appear in a longer version in VLDB.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>We assume the usual notion of instance and formulas, based on infinite sets vars of variables and
Vals of values. We assume that Vals contains every element that will occur in an instance. For
 a formula or a set thereof, consts( ) and vars( ) are the sets of constants and free variables,
respectively, in  .</p>
      <p>Dependencies. A tuple generating dependency (TGD) is a first-order formula of the form:
∀⃗[ (⃗) → ∃⃗  (⃗, ⃗)] where  and  are conjunctions of atoms.  is referred to as the body
and  as the head. A TGD is full if the head contains no existential quantifiers. A TGD is
in head-normal form if it is full and its head contains exactly one atom, or it is non-full and
each head atom contains at least one existentially quantified variable. Each TGD can be easily
transformed to an equivalent set of TGDs in head-normal form. A full TGD in head-normal
form is a Datalog rule, and a Datalog program is a finite set of Datalog rules. The head-width of
a TGD  (hwidth( )) is the number of variables in the head; this is extended to sets of TGDs by
taking the maxima over all TGDs. A Guarded TGD (GTGD) is one where at least one atom in 
contains all the variables of  .</p>
      <p>
        Fact Entailment. Given a set of TGDs Σ and a set of facts , the ground closure of  under Σ is
the set of facts  using only constants from  (so-called base facts) that can be derived from 
using Σ . The fact entailment problem is to decide whether a given base fact  is in the ground
closure of  under Σ . We are interested in deciding fact entailment for GTGDs.
The Chase. Fact entailment for TGDs is semi-decidable, and many variants of the chase can
be used to define a (possibly infinite) set of facts that contains precisely all base facts entailed
by an instance and a set of TGDs. By drawing inspiration from techniques for reasoning with
guarded logics [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ] and referential database constraints [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], fact entailment for GTGDs can
be decided by using a variant of the chase that works on tree-like structures. Specifically, a
chase tree  consists of a directed tree, one tree vertex that is said to be recently updated, and a
function mapping each vertex  in the tree to a finite set of facts  (). A chase tree  can be
transformed to another chase tree  ′ in the following two ways.
• One can apply a chase step with a GTGD  = ∀⃗[ → ∃⃗  ] in head-normal form. The
precondition is that there exist a vertex  in  and a substitution  with domain ⃗ such that
 ( ) ⊆  (). The result of the chase step is obtained as follows.
– If  is full (and thus  is a single atom), then chase tree  ′ is obtained from  by making
 recently updated in  ′ and setting  ′() =  () ∪ { ( )}.
– If  is not full, then  is extended to a substitution  ′ that maps each variable in
⃗ to a labeled null not occurring in  . The chase tree  ′ is then obtained from 
by introducing a fresh child ′ of , making ′ recently updated in  ′, and setting
 (′) =  ′( ) ∪ { ∈  () |  is Σ -guarded by  ′( )}, where  is Σ -guarded by some
set of facts  if consts( ) ⊆ consts( ′) ∪ consts(Σ) for some  ′ ∈ .
• One can apply a propagation step from a vertex  to a vertex ′ in  . Chase tree  ′ is
obtained from  by making ′ recently updated in  ′ and setting  ′(′) =  (′) ∪  for
some nonempty set  satisfying  ⊆ {  ∈  () |  is Σ -guarded by  (′)}.
      </p>
      <p>
        A tree-like chase sequence for an instance  and a finite set of GTGDs Σ in head-normal form
is a finite sequence of chase trees 0, . . . ,  such that 0 contains exactly one root vertex 
that is recently updated in 0 and 0() = , and each  with 0 &lt;  ≤  is obtained from − 1
by a chase step with some  ∈ Σ or a propagation step. For each vertex  in  and each fact
 ∈ (), this sequence is a tree-like chase proof of  from  and Σ . It is well known that
, Σ |=  if and only if there exists a tree-like chase proof of  from  and Σ (e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
Rewriting. A Datalog rewriting of a finite set of TGDs Σ is a Datalog program rew(Σ) such
that , Σ |=  if and only if , rew(Σ) |=  for every instance  and base fact  . If Σ contains
GTGDs only, then a Datalog rewriting rew(Σ) is guaranteed to exist (which is not the case for
general TGDs). Thus, we can reduce fact entailment for GTGDs to Datalog reasoning, which
can be solved using highly optimized Datalog techniques.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. The Chase and Datalog Rewriting</title>
      <p>Our objective is to develop algorithms that compute the rewriting of GTGDs. Intuitively, each
algorithm will derive Datalog rules that summarize sequences of steps in a tree-like chase proof.
Instead of introducing a child vertex ′ by using a chase step with a non-full GTGD at vertex
, performing some inferences in ′, and then propagating a derived fact  back from ′ to ,
these “shortcuts” will derive  in one step without having to introduce ′. The main question
is how to keep the number of derived rules low while still being able to derive all sequences
necessary for completeness. A key observation is that instead of considering arbitrary chase
proofs, we can restrict our attention to chase proofs that are one-pass according to Definition 4.1
below.</p>
      <p>Definition 4.1. A tree-like chase sequence 0, . . . ,  for an instance  and a finite set of GTGDs
Σ in head-normal form is one-pass if, for each 0 &lt;  ≤ , chase tree  is obtained by applying
one of these two steps to the recently updated vertex  of − 1:
• a propagation step copying exactly one fact from  to its parent, or
• a chase step with a GTGD from Σ provided that no propagation step from  to the parent of  is
applicable.</p>
      <p>Thus, each step in a one-pass tree-like chase sequence is applied to a “focused” vertex; steps
with non-full TGDs move the “focus” from parent to child, and propagation steps move the
“focus” in the opposite direction. Moreover, once a child-to-parent propagation takes place, the
child cannot be revisited in further steps. Theorem 4.2 states a key property about chase proofs
for GTGDs: whenever a proof exists, there exists a one-pass proof too.</p>
      <p>Theorem 4.2. For every instance , each finite set of GTGDs Σ in head-normal form, and each
base fact  such that , Σ |=  , there exists a one-pass tree-like chase proof of  from  and Σ .</p>
      <p>One-pass chase proofs are interesting because they can be decomposed into loops: a
subsequence , . . . ,  of chase steps that move the “focus” from a parent to a child vertex, perform
a series of inferences in the child and its descendants, and finally propagate one fact back to the
parent. If non-full TGDs are applied to the child, then the loop can be recursively decomposed
into further loops at the child. The properties of the one-pass chase ensure that each loop is
ifnished as soon as a fact is derived in the child that can be propagated to its parent, and that
the vertices introduced in the loop are not revisited at any later point in the proof. This way,
each loop at vertex  can be seen as taking the set () as input and producing the output fact
 that is added to  (). This leads us to the following idea: for each loop with the input set of
facts (), a rewriting should contain a “shortcut” Datalog rule that derives the loop’s output.</p>
      <p>These ideas are formalized in Proposition 4.3, which will provide us with a correctness
criterion for our algorithms.</p>
      <p>Proposition 4.3. A Datalog program Σ ′ is a rewriting of a finite set of GTGDs Σ in head-normal
form if
• Σ ′ is a logical consequence of Σ ,
• each Datalog rule of Σ is a logical consequence of Σ ′, and
• for each instance , each one-pass tree-like chase sequence 0, . . . ,  for  and Σ , and each
loop , . . . ,  at the root vertex  with output fact  , there exist a Datalog rule  →  ∈ Σ ′
and a substitution  such that  ( ) ⊆ () and  () =  .</p>
      <p>Intuitively, the first condition ensures soundness: rewriting Σ ′ should not derive more facts
than Σ . The second condition ensures that Σ ′ can mimic direct applications of Datalog rules
from Σ at the root vertex . The third condition ensures that Σ ′ can reproduce the output of
each loop at vertex  using a “shortcut” Datalog rule.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Rewriting Algorithms</title>
      <p>
        We now consider ways to produce Datalog rules that “shortcut” all necessary steps in the chase.
These are motivated and proven correct using Proposition 4.3. Each method is presented as a
set of inference rules that derive new rules (either full or non-full) from existing ones. We will
then close the original set of rules under this inference process. Since this inference process is
closely related to resolution in classical theorem proving [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], we sometimes refer to the output
of a step as a resolvent.
      </p>
      <p>
        In this short paper, we omit details of the algorithm that uses the inference rules to perform
the closure: see [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The algorithm maintains sets of rules that have not yet been processed,
and performs an iteration. At each step it performs redundancy elimination, important not only
for eficiency but also for termination, and normalization (e.g. conversion to head normal form,
choosing canonical names for variables).
      </p>
      <sec id="sec-5-1">
        <title>The FullDR Algorithm: Creating Datalog Rules Directly. Our first algorithm will create</title>
        <p>
          only full (aka Datalog) rules. Similar algorithms have appeared in the prior literature for related
TGD classes [
          <xref ref-type="bibr" rid="ref18 ref8">8, 18</xref>
          ]. The main interest here is that this algorithm will serve as a baseline.
Definition 5.1. The Full Datalog Rewriting inference rule FullDR can be applied in two ways,
depending on the types of TGDs it takes.
• The (COMPOSE) variant of the FullDR inference rule takes full TGDs
 = ∀⃗[
→ ]
and
 ′ = ∀⃗[′ ∧  ′ →  ′]
and a substitution  such that
–  () =  (′),
– dom( ) = ⃗ ∪ ⃗, and
– rng( ) ⊆ ⃗ ∪ consts( ) ∪ consts( ′), where ⃗ is a vector of hwidth(Σ) +
variables diferent from ⃗ ∪ ⃗,
and it derives  ( ) ∧  ( ′) →  ( ′).
• The (PROPAGATE) variant of the FullDR inference rule takes TGDs
|consts(Σ) |
|consts(Σ) |
 = ∀⃗[
→ ∃⃗  ∧ 1 ∧ · · · ∧
]
and
′ ∧  ′ →  ′]
and a substitution  such that
–  () =  (′) for each  with 1 ≤  ≤ ,
– dom( ) = ⃗ ∪ ⃗,
– rng( ) ⊆ ⃗ ∪ ⃗ ∪ consts( ) ∪ consts( ′), where ⃗ is a vector of hwidth(Σ) +
variables diferent from ⃗ ∪ ⃗ ∪ ⃗,
–  (⃗) ∩ ⃗ = ∅, and
– vars( ( ′)) ∩ ⃗ = ∅ and vars( ( ′)) ∩ ⃗ = ∅,
and it derives  ( ) ∧  ( ′) →  ( ′).
        </p>
        <p>The rough idea is that the (PROPAGATE) variant simulates generation of a fact via propagation
from a child to its parent in the chase: We generate a child using one of the original rules and the
child gains a fact  guarded by the parent using a derived rule. Then (PROPAGATE) generates
a rule that adds  directly. The (COMPOSE) variant simulates transitive fact derivations inside
a node: at a node , one derived Datalog rule will generate a fact 1, and then a second derived
Datalog rule will use the facts of  unioned with 1 to generate an additional fact 2. The
(COMPOSE) variant will generate 2 directly from .</p>
        <p>The FullDR algorithm has several obvious weak points. First, it considers all possible ways to
compose Datalog rules as long as this produces a rule with at most hwidth(Σ) + |consts(Σ) |
variables. Second, it is not clear how one eficiently obtains the atoms 1, . . . ,  and ′1, . . . , ′
participating in the (PROPAGATE) variant. Third, the number of substitutions  in the
(COMPOSE) and (PROPAGATE) variants of the FullDR inference rule can be very large.</p>
        <p>Nevertheless, we implemented FullDR using subsumption and indexing techniques used for
the other algorithms. Unsurprisingly, we did not find it competitive in our experiments.
The Existential-Based Rewriting. We now consider an algorithm that generates both non-full
and Datalog rules inductively. Before defining the algorithm formally, we introduce a refined
notion of unification.</p>
        <p>Definition 5.2. For  a set of variables, an -MGU  of atoms 1, . . . ,  and 1, . . . ,  is
a most general unifier (MGU) of 1, . . . ,  and 1, . . . ,  with the additional requirement that
 () =  for each  ∈ .</p>
        <p>It is straightforward to see that an -MGU is unique up to the renaming of variables not
contained in , and that it can be computed as usual while treating variables in  as if they
were constants. We are now ready to formalize the ExbDR algorithm.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Definition 5.3.</title>
        <p>The Existential-Based Datalog Rewriting inference rule ExbDR takes GTGDs
 = ∀⃗[ → ∃⃗  ∧ 1 ∧ · · · ∧
]
and
′ ∧  ′ → ′]
with  ≥ 1, and a ⃗-MGU  of 1, . . . ,  and ′1, . . . , ′ such that
 (⃗) ∩ ⃗ = ∅
and
vars( ( ′)) ∩ ⃗ = ∅,
and it derives  ( ) ∧  ( ′) → ∃⃗  ( ) ∧  (1) ∧ · · · ∧  () ∧  (′).</p>
        <p>The intuition behind ExbDR is that it mimics two kinds of evolutions in a one-pass chase
proof. One kind starts with a parent, produces a child node and then evolves it (via some loops
rooted at the child). This evolution is mimicked by producing a non-full TGD that directly
produces the updated child from the parent. The evolution of this child may eventually include a
fact that is guarded by the parent, and hence could be propagated to the parent. The ExbDR rule
also captures this propagation by again deriving a rule with existentials that directly produces
the updated child from the parent; but when this rule is converted into head normal form, it
will break down into a non-full and a full rule, the latter of which mimics the propagation step.
Example 5.4. Consider the set of GTGDs Σ =</p>
        <p>{ 1, . . . ,  4}, where
 1 = (1) → ∃1, 2  (1, 1, 2),  2 =  (1, 2, 3) → ∃  (1, 2, ),
 3 =  (1, 2, 3) →  (1, 2),
 4 =  (1, 2, 3) ∧  (1, 2) ∧ (1) →  (1),</p>
      </sec>
      <sec id="sec-5-3">
        <title>Definition 5.5.</title>
        <p>headed rules
such that
We can first apply the ExbDR rule to  2 and  3 to obtain the Datalog rule
 5 =  (1, 2, 3) →  (1, 2). We can then take  1 and  5 to derive the non-full
 6 = (1) → ∃1, 2(︀  (1, 1, 2) ∧  (1, 1))︀ . Finally, we can take  6 and  4 to
obtain  7 = (1) ∧ (1) →  (1). No further ExbDR steps are possible and the set
rew(Σ) = { 3,  4,  5,  7} is returned.</p>
        <p>Using Skolemization. The ExbDR algorithm exhibits two drawbacks. First, each application
of the ExbDR inference combines the head atoms of two rules, so the rule heads can get very
long. Second, each inference requires matching a subset of body atoms of  ′ to a subset of the
head atoms of  . This can be costly, particularly when rule heads are long.</p>
        <p>We would ideally derive GTGDs with a single head atom and unify just one body atom of  ′
with the head atom of  , but this does not seem possible if we stick to manipulating GTGDs:
doing so would require us to refer to the same existentially quantified object in diferent GTGDs.
However, this can be achieved by replacing existentially quantified variables by Skolem terms,
which in turn gives rise to the SkDR algorithm from Definition 5.5.</p>
        <p>The Skolem Datalog Rewriting inference rule SkDR takes two Skolemized
single =  → 
and
 ′ = ′ ∧  ′ → ′</p>
      </sec>
      <sec id="sec-5-4">
        <title>Definition 5.6.</title>
        <p>headed rules
with  ≥ 1 such that
•  is Skolem-free and  contains a Skolem symbol, and
• ′ contains a Skolem symbol, or  ′ is Skolem-free and ′ contains all variables of  ′,
and, for  an MGU of  and ′, it derives  ( ) ∧  ( ′) →  (′).</p>
        <p>The intuition is that the rule applies in two situations. The first is where ′ is Skolem-free
and the atom ′ in  ′ that gets resolved with the head atom of  is a guard for the body. Such a
rule may introduce Skolems originating in  in the body of the resolvent. The second case is
where ′ contains such a Skolem symbol. In this case, the resolvent will have fewer Skolems in
its body than  ′. Successive applications of the second case can produce rules with no Skolem
in the body, and eventually can produce a Datalog rule.</p>
        <p>
          Though the Skolem rewriting can improve exponentially on ExbDR, there are also examples
that give an exponential gap in the other direction. Examples can be found in the full paper [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
Combining Several SkDR Steps Into One. The SkDR algorithm can produce many rules
with Skolem symbols in the body, which is the main reason why it can lose to ExbDR. We next
present the HypDR algorithm. It uses the hyperresolution inference rule [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] as a kind of “macro”
to combine several SkDR steps into one. Our experiments show that this can be beneficial.
        </p>
        <p>The Hyperresolution Rewriting inference rule HypDR takes Skolemized
single 1 =  1 → 1
. . .   =   → ,
and
 ′ = ′1 ∧ · · · ∧</p>
        <p>′ ∧  ′ → ′
• for each  with 1 ≤  ≤ , conjunction   is Skolem-free and atom  contains a Skolem symbol,
• rule  ′ is Skolem-free,
and, for  an MGU of 1, . . . ,  and ′1, . . . , ′, if conjunction  ( ′) is Skolem-free, it derives
 ( 1) ∧ · · · ∧  ( ) ∧  ( ′) →  (′).</p>
        <p>
          We emphasize that all presented algorithms can be proven correct using the completeness
criterion of Proposition 4.3, relying on the one-pass chase. Details are in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. There, we also
prove the complexities of the algorithms:
Theorem 5.7. Each presented rewriting can be computed in 2ExpTime, in ExpTime for bounded ,
where  is the maximum relation arity in Σ , and in PTime if Σ is fixed ( data complexity).
        </p>
        <p>
          The resulting rewritings can thus be large in the worst case. From a theoretical point of view,
checking fact entailment via our approach is worst-case optimal [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          In the full paper [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], we show empirically that rewritings are suitable for practical use
by using a comprehensive collection of 428 synthetic and realistic inputs. We show that our
algorithms can indeed rewrite complex GTGDs, and that the rewriting can be successfully
processed by modern Datalog systems. Complete evaluation results are available online [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        We overviewed an approach for fact entailment for guarded TGDs by rewriting the GTGDs
into a Datalog program that entails the same base facts on each instance. We base this on a
connection between Datalog rewriting and an analysis of a specialized version of the chase.
This connection allowed us to arrive at our algorithms as well as to prove their correctness. We
believe this connection also makes it more intuitive why Datalog-rewritability holds. We briefly
presented several algorithms based on this approach. In the full paper [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], we provide not only
full proofs but also discuss implementation and experimental evaluation of the approaches. The
evaluation shows that the algorithms other than FullDR are competitive and useful in practice.
      </p>
      <p>
        In the future, we plan to generalize our framework to wider classes of TGDs, such as
frontierguarded TGDs, and to provide rewritings for conjunctive queries under certain answer
semantics. Moreover, we shall investigate whether the extension of our framework to disjunctive
GTGDs [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] can be used to obtain practical algorithms to rewrite into disjunctive Datalog.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Calì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <article-title>A general datalog-based framework for tractable query answering over ontologies</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>14</volume>
          (
          <year>2012</year>
          )
          <fpage>57</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Rabin</surname>
          </string-name>
          ,
          <article-title>Decidability of second-order theories and automata on infinite trees</article-title>
          ,
          <source>Transactions of the american Mathematical Society</source>
          <volume>141</volume>
          (
          <year>1969</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bárány</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ten
            <surname>Cate</surname>
          </string-name>
          , L. Segoufin, Guarded negation,
          <source>J. ACM</source>
          <volume>62</volume>
          (
          <year>2015</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          (Eds.),
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          , 2nd ed., Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hirsch</surname>
          </string-name>
          , Guarded logics: Algorithms and bisimulation,
          <year>2002</year>
          . Available at http://www. umbrialogic.com/hirsch-thesis.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Marnette</surname>
          </string-name>
          ,
          <article-title>Resolution and datalog rewriting under value invention and equality constraints</article-title>
          ,
          <source>arXiv preprint arXiv:1212.0254</source>
          (
          <year>2012</year>
          ). http://arxiv.org/abs/1212.0254.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Baget</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-L. Mugnier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Thomazo, Walking the complexity lines for generalized guarded existential rules</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <article-title>Expressiveness of guarded existential rule languages</article-title>
          ,
          <source>in: PODS</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bárány</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Ten</given-names>
            <surname>Cate</surname>
          </string-name>
          , Rewriting Guarded Negation Queries, in: MFCS,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <article-title>Reasoning in description logics using resolution and deductive databases</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Karlsruhe Institute of Technology, Germany,
          <year>2006</year>
          . URL: http://digbib.ubka. uni-karlsruhe.de/volltexte/1000003797.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <article-title>Rewriting guarded existential rules into small datalog programs</article-title>
          ,
          <source>in: ICDT</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>U.</given-names>
            <surname>Hustadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          ,
          <article-title>Reasoning in description logics by a reduction to disjunctive datalog</article-title>
          ,
          <source>JAR</source>
          <volume>39</volume>
          (
          <year>2007</year>
          )
          <fpage>351</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>U.</given-names>
            <surname>Hustadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          , U. Sattler,
          <article-title>Reducing SHIQ-description logic to disjunctive datalog programs</article-title>
          .,
          <source>KR</source>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Buron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Germano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kappelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          , Guarded saturation,
          <year>2021</year>
          . URL: https://krr-oxford.github.io/Guarded-saturation/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Tractable reasoning and eficient query answering in description logics: The DL-Lite family</article-title>
          ,
          <source>JAR</source>
          <volume>39</volume>
          (
          <year>2007</year>
          )
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Calì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Query rewriting and answering under constraints in data integration systems</article-title>
          , in: IJCAI,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>Query answering for existential rules via eficient datalog rewriting</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amarilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          ,
          <article-title>When can we answer queries using result-bounded data interfaces?</article-title>
          ,
          <source>in: PODS</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kappelmann</surname>
          </string-name>
          ,
          <article-title>Decision Procedures for Guarded Logics</article-title>
          , CoRR abs/
          <year>1911</year>
          .03679 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Why is modal logic so robustly decidable?</article-title>
          ,
          <source>in: DIMACS Series in Disc. Math. and TCS</source>
          , volume
          <volume>31</volume>
          ,
          <year>1997</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Andréka</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Benthem</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Németi</surname>
          </string-name>
          ,
          <article-title>Modal languages and bounded fragments of predicate logic</article-title>
          ,
          <source>J. Phil. Logic</source>
          <volume>27</volume>
          (
          <year>1998</year>
          )
          <fpage>217</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , A. C.
          <article-title>Klug, Testing Containment of Conjunctive Queries under Functional and Inclusion Dependencies</article-title>
          , JCSS
          <volume>28</volume>
          (
          <year>1984</year>
          )
          <fpage>167</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>J. A. Robinson,</surname>
          </string-name>
          <article-title>A machine-oriented logic based on the resolution principle</article-title>
          ,
          <source>JACM</source>
          <volume>12</volume>
          (
          <year>1965</year>
          )
          <fpage>23</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Buron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Germano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kappelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <article-title>Rewriting the infinite chase</article-title>
          ,
          <source>in: VLDB</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Georgieva</surname>
          </string-name>
          , U. Hustadt,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>Hyperresolution for guarded formulae</article-title>
          ,
          <source>Journal of Symbolic Computation</source>
          <volume>36</volume>
          (
          <year>2003</year>
          )
          <fpage>163</fpage>
          -
          <lpage>192</lpage>
          . First Order Theorem Proving.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>