=Paper= {{Paper |id=Vol-2663/paper-22 |storemode=property |title=First Order Rewritability for Ontology Mediated Querying in Horn-DLFD |pdfUrl=https://ceur-ws.org/Vol-2663/paper-22.pdf |volume=Vol-2663 |authors=David Toman,Grant Weddell |dblpUrl=https://dblp.org/rec/conf/dlog/TomanW20 }} ==First Order Rewritability for Ontology Mediated Querying in Horn-DLFD== https://ceur-ws.org/Vol-2663/paper-22.pdf
First Order Rewritability for Ontology Mediated
           Querying in Horn-DLF D

                         David Toman and Grant Weddell

                Cheriton School of CS, University of Waterloo, Canada
                            {david,gweddel}@uwaterloo.ca


        Abstract. We study first order (FO) rewritability for query answering
        in the setting of ontology mediated queries (OMQ) in which ontologies
        are formulated in the Horn fragments of the feature description logic
        DLFD. In general, OMQ approaches for logics such as DLFD rely on
        non-FO completion of the data (ABoxes) that can commonly be ex-
        pressed as Datalog programs. In this paper, we study the problem of the
        existence of FO rewritings for a given Horn-DLFD theory (TBox) in
        terms of Beth definability, and show how Craig interpolation can then be
        used to effectively construct the rewritings (when they exist) from the
        Clark’s completion of Datalog programs for computing ABox comple-
        tions. In addition, since data that populates OMQ is commonly residing
        in relational database systems, we show how constraints enforced by such
        systems can be used to expand the scope of rewritable queries.


1     Introduction
In earlier work [25,27], we presented a combined-combined approach to ontology
mediated querying (OMQ) of backend relational data sources when ontologies
are formulated as TBoxes in the Horn fragments of the description logic (DL)
DLFD and where ABoxes are induced by data residing in the data sources. The
logic DLFD is the most expressive member of the FunDL family of feature-based
DLs [26], which are particularly well suited for capturing relational schemata by
virtue of being feature-based and also by virtue of including the path functional
dependency (PFD) concept constructor. PFDs makes it possible to capture a
variety of equality generating dependencies commonly occurring in relational
schemata, such as primary keys, uniqueness constraints and functional depen-
dencies. To illustrate, we show in the next section how the relational schema in
Figure 1 can be captured as a Horn-DLFD TBox, the FunDL dialect that will
be the focus of the remainder of this paper. This example also serves to illustrate
how the parts of a TBox that reflect a relational schema could in principle be
automatically generated, and thereby make it unnecessary to author so-called
mapping rules in ontology based data access (OBDA) (again see [25,27]).
    The combined-combined approach is so named because any of the Horn frag-
ments of DLFD in the FunDL family will require using ontological knowledge,
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
a Horn-DLFD TBox in our case, for two steps in OMQ. The first step, as with
the combined approach to OMQ [19,20,22,23], is to complete the ABox via a
Datalog program that adds tuples to tables in the backend relational database.
Note that this first step depends only on the TBox. The second step, as with
the perfect rewriting approach to OMQ and OBDA [9], is to rewrite a query to
account for such artifacts as inheritance and attribute typing in the TBox.
    In this paper, we show how Beth definability can be used to resolve the
problem of recognizing when a first order (FO) rewriting of the Datalog program
needed in the first step exists, and then how Craig interpolation can be used to
effectively construct such a rewriting. A key insight is to employ the Clark’s
completion of the Datalog program in both the diagnosis and synthesis of FO
expressibility of ABox completion.
    The existence of such a rewriting enables an OMQ frontend to a relational
data source to operate entirely by a more refined query reformulation that, for
a given user query, ultimately yields an SQL query over the data source, that
is, with no requirement to update tables in the data source beforehand. An
additional advantage of our approach is that it also becomes straightforward to
incorporate the benefits of integrity constraints such as foreign key constraints
that are enforced by a relational database management system to simplify query
reformulation. Indeed, such integrity constraints can enable FO rewriting of data
completion that would otherwise not be possible.
    In summary, our contributions are as follows.

1. We show how to decide FO rewritability of OMQ in Horn-DLFD via Clark’s
   completion of Datalog programs and Beth definability;
2. We show how an equivalent non-recursive Datalog program can be derived
   via Craig’s theorem and interpolation in cases where ABox completion for a
   given TBox is indeed FO expressible;
3. We illustrate how integrity constraints in backend relational schema can
   enable FO rewritability of OMQ that would otherwise not be possible; and
4. We show how our framework extends to query specific OMQ in a straight-
   forward manner.

The remainder of the paper is organized as follows. Section 2 immediately fol-
lowing provides the necessary background and definitions. Here, we review Horn-
DLFD and the combined combined approach to OMQ. Our main results follow
in Section 3 in which we show how the above-mentioned artifacts, Clark’s com-
pletion of Datalog programs for example, can be employed to both decide FO
rewritability and to synthesize FO rewritings of ABox completion in the com-
bined combined approach to OMQ. In Section 4 shows how a Datalog program
computing an ABox completion can usefully incorporate database constraints
enforced by backend relational database systems. Section 5 then considers query
specific OMQ, that is, when the combination of a Datalog ABox completion
procedure and a specific query are FO expressible. Summary comments and our
conclusions are in Section 6.
          create table EMP ( name STRING not null,
             dept STRING, man STRING,
             primary key ( name ),
             constraint deptRef foreign key ( dept ) references DEPT,
             constraint manRef foreign key( man ) references EMP );

          create table DEPT ( name STRING not null,
             head STRING,
             primary key ( name ),
             candidate key ( head ),
             constraint headRef foreign key( head ) references EMP );


                            Fig. 1. A Relational Schema.

2     Background and Definitions

All member dialects of the FunDL family of DLs [26] are fragments of FOL
with underlying signatures based on disjoint sets of unary predicate symbols
PC, called primitive concepts, constant symbols IN, called individuals, and unary
function symbols F, called features. The latter replace roles in other DLs and
are interpreted as either total functions or partial functions, depending on the
particular FunDL dialect.
   The most expressive member of the FunDL family is the dialect DLFD.
In this paper, we consider Horn-DLFD, the most expressive Horn fragment
of DLFD, assuming in particular that an ontology is given as a Horn-DLFD
TBox.1 To simplify our presentation, we also assume features are interpreted as
total functions.

Definition 1 (Horn-DLFD) A path function Pf is a word in F∗ , with the
empty word denoted by id , and concatenation by “.”. Concept descriptions of
two kinds, C and D, are defined by the grammar rules on the left-hand-side
of Figure 2, where A ∈ PC, f ∈ F and Pf (possibly subscripted) refers to a
path function. An instance of the final production is called a path functional
dependency (PFD).
   Semantics is defined in the standard way with respect to an interpretation
I = (4, (·)I ) that fixes the meaning of symbols in PC, IN and F. Here, features
are interpreted as total functions. We also assume the unique name assumption
(UNA) whereby, for any distinct individuals a and b, aI 6= bI .
   The interpretation function I is extended to path expressions by interpreting
id (the empty word) as the identity function λx.x, concatenation as function
composition, and to complex concept descriptions C or D as given on the right-
hand-side of Figure 2. An interpretation I satisfies a subsumption C v D if
1
    Here, we do not consider the problem of recognizing when TBoxes in very expressive
    DLs have unique minimal models.
           Syntax                                        Semantics: “(·)I ”
C ::= A                          AI ⊆ 4
   | ∀ Pf .C                     {x | Pf I (x) ∈ C I }
   | C1 u C2                     C1I ∩ C2I
D ::= C                          CI ⊆ 4
   | ⊥                           ∅
   | ∃f −1                       {x | ∃y ∈ 4 : f I (y) = x}
   | ∀ Pf .D                     {x | Pf I (x) ∈ DI }
                                                    Vk
   | C : Pf 1 , . . . , Pf k → Pf {x | ∀y ∈ C I :     i=1
                                                            Pf Ii (x) = Pf Ii (y) ⇒ Pf I (x) = Pf I (y)}


                             Fig. 2. Horn-DLFD Concepts.

C I ⊆ DI , a concept assertion A(a) if aI ∈ AI , and a feature assertion f (a) = b
if f I (aI ) = bI , where a, b ∈ IN.
    A knowledge base (KB) K = (T , A) consists of a TBox T of subsumptions,
and an ABox A of assertions. I satisfies K if it satisfies each subsumption and
assertion in K.
Although incorporating features deviates from the normal practice of using bi-
nary predicate symbols called roles, features make it easier to incorporate con-
cept constructors suited to the capture of relational data sources that include
various dependencies by a straightforward reification of n-ary predicates. Thus,
e.g., a role R can be reified as a primitive concept RC and two features domR
and ranR in Horn-DLFD, and a subsumption A v ∀R.B can then be captured
as a subsumption ∀domR.A v ∀ranR.B.
    As usual, to ensure decidability in the presence of ABoxes [28] and reasonable
computational properties of the logic, one looks for additional restrictions on the
PFD concept constructors. One restriction, which has been investigated (e.g.,
[17,27]), is to limit the PFD constructor to one of the following two forms:

                        1. C : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf or
                        2. C : f1 , . . . , fk → g

We can allow a more complex variant of (2) if the ABox completion admits/requi-
res creating additional constant symbols; for details—beyond the scope of this
paper—see [27]. Also, in this paper we will not explore the parameter tractable
fragments of Horn-DLFD nor the issues connected with interpreting features
as partial functions. Both these issues were studied in [25] which shows, among
other things, how partiality can be easily simulated in Horn-DLFD.

Example 2 Figure 3 lists the subsumptions for a Horn-DLFD TBox that pro-
vides an ontological view of the relational schema illustrated in Figure 1. Part (a)
are subsumptions that capture column definitions and data dependencies mani-
fest in the schema itself. This part of an ontology can be automatically generated
 (disjoint classes) EMP u DEPT v ⊥          (disjoint columns) EMP v DEPT : name → id
                  EMP u STRING v ⊥             (b) additional data dependencies
                  DEPT u STRING v ⊥
(attribute typing) EMP v ∀name.STRING             (inheritance) MAN v EMP
                  EMP v ∀deptRef.DEPT            (local typing) EMP v ∀manRef.MAN
                  EMP v ∀manRef.EMP                             DEPT v ∀headRef.EMP
                  DEPT v ∀headRef.MAN                (inverses) MAN v ∃headRef−1
                  DEPT v ∀name.STRING              (c) ontology enhancement
   (primary keys) EMP v EMP : name → id
                  DEPT v DEPT : name → id
 (candidate keys) DEPT v DEPT : head → name
(a) derived from the relational schema


                  Fig. 3. An Ontology as a Horn-DLFD TBox.

(for an outline of a procedure to do this, see [25]). Observe in particular how
the name attached to a foreign key constraint, such as headRef, is mapped to
a feature with the same name. Part (b) illustrates how additional data depen-
dencies not expressible without appeal to general assertions in SQL can also be
captured in Horn-DLFD. Here, the (presumed) fact that name values of em-
ployees are disjoint from name values of departments is captured by appeal to
an asymmetric use of PFDs. Finally, Part (c) illustrates how an ontology can
be enhanced by adding further subsumptions, in this case relating to the virtual
concept of a manager.

    Conjunctive queries are, as usual, formed from atomic queries (or atoms) of
the form “A(x)” and “x.f = y”, where x and y are variables, using conjunc-
tion and existential quantification. To simplify notation, we conflate conjunctive
queries with the set of its constituent atoms and a set of answer variables:

Definition 3 (Conjunctive Queries and Certain Answers) Let A(xi ) and
xi1 .fi = xi2 be (first-order) atoms, where A is a primitive concept , fi is a feature,
and x̄ and ȳ tuples
               V       of variables. A conjunctive query (CQ) ϕ is an expression of
the form ∃ȳ. ψ with free variables x̄, that is constructed from the above atoms
ψ using conjunctions and existential quantification.
     Let K be a Horn-DLFD knowledge base and ϕ a CQ. A certain answer to ϕ
over K is a substitution of constant symbols ā, [x̄ 7→ ā], such that K |= ϕ[x̄ 7→ ā].

As usual, UCQ stands for a union of CQs. To study the first-order rewritability
of conjunctive queries over Horn-DLFD TBoxes, we use the following version of
the combined combined approach to OMQ in our setting.

Proposition 4 (Combined Combined Approach to OMQ [25,27,30])
Let K = (T , A) be a Horn-DLFD knowledge base and ϕ a conjunctive query.
Then there is a UCQ query ϕT and a Datalog program ΠT , both of which can
be effectively constructed from T , such that

                   K |= ϕ[x̄ 7→ ā] ⇐⇒ ΠT (A) |= ϕT [x̄ 7→ ā]

for a tuple of constant symbols ā and where ΠT (A) is the minimal model of ΠT
when evaluated over A.

The proposition uses a Datalog program ΠT to define an ABox completion over
which the query ϕT , the rewriting of the original user query, is evaluated to com-
pute the certain answers. The Proposition also shows that the data complexity
of OMQ in Horn-DLFD is complete for PTIME.


3   Classification of Horn-DLF D TBoxes

To simplify the presentation, we assume that Horn-DLFD TBoxes are in a
normal form given as follows:

Definition 5 (Normal Form of Horn-DLFD TBoxes)
Let T be a Horn-DLFD TBox. We say that T is in normal form if all subsump-
tions adhere to one of the following six forms:

            1. A v ⊥,                  4. ∀f.A v B,
            2. A1 u A2 v B,            5. A v ∃f −1 , and
            3. A v ∀f.B,               6. A v B : Pf 1 , . . . , Pf k → Pf.

It is an easy exercise to convert an arbitrary Horn-DLFD TBox to its conserva-
tive extension in this normal form by introducing additional primitive concepts.

Definition 6 (Datalog Program ΠT ) The Datalog program ΠT used in Pro-
position 4 consists of completion rules obtained by translating subsumptions that
are logical consequences of T . The form of these subsumptions and their trans-
lation are given as follows:

              (consequences of T )        (completion rule in ΠT )
              Av⊥                        C⊥ (x) ← CA (x)
              A1 u A2 v B                CB (x) ← CA1 (x), CA2 (x)
              A v ∀f.B                   CB (x) ← CA (y), Rf (y, x)
              ∀f.A v B                   CB (x) ← CA (y), Rf (x, y)

For every primitive concept B, we introduce an unary EDB predicate PB (x)
together with an additional clause CB (x) ← PB (x) that captures explicit data
in an ABox of the form A(a), and an IDB predicate CB (x) corresponding to the
completion of the ABox w.r.t. T . For features, we use Rf (x, y) as a synonym
for ABox assertion f (x) = y to conform with standard Datalog syntax. We also
define an auxiliary symbol C⊥ (x) that will be used to test for ABox consistency.
Note that the subsumptions for unqualified inverse features (5) and PFDs (6) do
not contribute to the definition of an ABox completion since they do not generate
additional ABox assertions for primitive concepts. Also note that, unlike in the
classical combined approach [19,23], our combined combined approach does not
introduce any new constants in the ABox [27].
    To test for first-order definability of the completion (i.e., all predicates that
stand for the completed ABox instance), we use the following construction:

Definition 7 (Clark’s Completion ΣT ) The Clark’s Completion [11] ΣT of
ΠT is given by a set of formulas
                     CB (x) ↔ PB (x) ∨ (∃y.α1 ) ∨ . . . ∨ (∃y.αn )
corresponding to all clauses CB (x) ← αi ∈ ΠT with the head CB (x).
This completion closes the original Datalog program in the following sense:

Proposition 8 ([11], simplified for this paper)
 – ΠT |= CB (a) implies ΣT |= CB (a), and
 – ΠT 6|= CB (a) implies ΣT |= ¬CB (a).
Note that Clark’s result works in the much more general setting of logic pro-
grams with function symbols and possibly infinite resolution proofs and under
the Negation As Failure semantics. Since Clark’s completion makes all IDB pred-
icates closed, we can now use standard tools for testing for explicit definability.
Note that had we used ΠT instead, none of the definability results could possibly
hold. Note that in absence of role/feature subsumptions (such as role hierarchies)
we do not need to apply the completion to the Rf atoms.

Proposition 9 (Projective Beth Definability [3]) Let Σ be an FO theory
over symbols in L and L0 ⊆ L. Then the following are equivalent:
 1. For M1 and M2 models of Σ such that M1|L0 = M2|L0 , it holds that M1 |=
    ϕ[a] iff M2 |= ϕ[a] for all M1 , M2 , and a tuples of constants, and
 2. ϕ is equivalent under Σ to a formula ψ in L0 .
This gives us a complete characterization of first-order rewritability of the ABox
closure of individual primitive concepts with respect to Horn-DLFD TBoxes as
follows:

Theorem 10 Let T be a Horn-DLFD TBox over the primitive concepts and
features {A1 , . . . , Ak , f1 , . . . , fn }, and let T 0 be a conservative extension of T
in normal form. Then the completion of the primitive concept A w.r.t. T is
first-order definable if and only if CAi (x) is Beth definable over ΣT 0 and L0 =
{PA1 , . . . , PAk , Rf1 , . . . , Rfn }.
Proof. Follows immediately from the properties of Beth definability (Proposi-
tion 9) and the definition and properties of the Clark’s completion (Proposi-
tion 8).
Observe that one can restrict the alphabet of the ABox (L0 ) to target only
ABoxes over restricted signature(s).
    Given ΣT , one can now reformulate (1) in Proposition 9 as a logical impli-
cation problem by making a copy of all formulas of ΣT in which all non-logical
symbols not in {PA1 , . . . , PAk , Rf1 , . . . , Rfn } are starred. Hence, the definability
question for CA (x) can be expressed as a logical implication question of the form:

                           ΣT ∪ ΣT∗ |= ∀x.CA (x) → CA
                                                    ∗
                                                      (x)                               (1)

Note that, on closer inspection, all formulas in ΣT can be written as ALCI
subsumptions.2 Hence:

Theorem 11 Let T be a Horn-DLFD TBox. Then the existence of
 1. the first order rewritability of the A completion with respect to T ,
 2. the consistency of T , and
 3. the uniform query rewritability over T
are all decidable and in EXPTIME.
Proof. The first claims follow immediately from Theorem 10 applied to all atoms
of the form CB and the decidability and complexity of reasoning in ALCI. The
second claim relies on definability of C⊥ (x) and, in the presence of key PFDs of
the form A v B : Pf 1 , . . . , Pf k → Pf, on the definability of CA (x) and CB (x)
since the remaining consistency check for PFD satisfaction can then be expressed
as a first-order query with respect to these explicit definitions. The last claim
follows by observing that (i) definability of atomic queries implies definability
of arbitrary (U)CQs using the combined-combined approach [25], and that (ii)
non-definability of a single atomic query exhibits the need for a non first-order
ABox completion for queries containing/consisting of this atom.
Note that in the third case one can restrict the definability conditions to atoms
that can appear in the query ϕT . A matching lower bound can be obtained
for expressive fragments of Horn-DLFD (for which the reasoning complexity is
EXPTIME-complete). However, the exact complexity is open for PTIME frag-
ments, such as CF Dnc and CF DI ∀− . Note, however, that since the size (and the
                                   kc
construction) of rewritings will dominate this cost (even for the simplest ontology
languages [18]), exact complexity bounds are mostly of academic interest.

Example 12 Returning to our introductory example, we find out that neither
CEMP (x) nor CDEPT (x) nor the TBox-derived CMAN (x) atom are first-order definable.
We confirmed this using a tableau-based depth-limited interpolant generator
[16,31]. One of the reasons is that an ABox can contain a chain of alternating
deptRef/empRef features without having the objects along this chain alternately
belonging to the EMP and DEPT concepts, save the individual at the beginning of
this chain. The Datalog completion rules are needed to propagate these concept
assertions along such a chain.
2
    Disregarding functionality of the original features does not matter at this juncture.
3.1   Construction of Rewritings
To obtain an algorithm that constructs rewritings from our characterization of
FO rewritability, we utilize Craig Interpolation:

Proposition 13 (Craig Interpolation [12]) Let ϕ and φ be first order for-
mulas such that |= ϕ → φ. Then there is a first order formula ψ, called Craig
interpolant, containing only symbols common to ϕ and φ such that |= ϕ → ψ
and |= ψ → φ.
Moreover, the interpolant can be extracted, typically in linear time, from a proof
of |= ϕ → φ, as long as a reasonably structural proof system, such as resolution,
(cut-free) sequent calculus, and/or analytic tableau is used.
    The formulation (1), after a minor manipulation of the formulas, can thus
be used to construct a Craig interpolant for CA (x) from a (analytic) tableau
proof of the above logical implication statement (if one exists): one can con-
struct, with the help of blocking [1,13], a finite analytic tableau for (1). The
size of the rewriting, however, will depend crucially on the proof system used
and on the presentation of the rewriting. However, by using an optimal tableau
for ALC [13] (modified for ALCI), one can construct an exponential represen-
tation of the rewriting by sharing common subexpressions in the same way the
tableau algorithm caches already unsatisfiable sets. However, plain first-order
representation of the formula will suffer from additional exponential blowup, as
one would expect.
    Combining the above construction with the already existing rewriting ϕT
from [25] we get:

Theorem 14 Let K = (T , A) be a Horn-DLFD knowledge base. Then the data
complexity of conjunctive query answering is in AC 0 whenever the A completion
with respect to T is first-order definable with respect to ΣT .
Proof. Let ψA (x) be a first order definition of CA w.r.t. ΣT . Then K |= ϕ(a) iff
A |= ϕT [ψA [y/x]/A(y) | A ∈ PC](a). The claim follows since ϕT [ψA [y/x]/A(y) |
A ∈ PC] is a first order formula, in particular a UCQ in the case of Horn-DLFD.


4     Integration of Database Constraints
For ABoxes constructed from relational instances (see the discussion in our In-
troduction and Background Sections), relational systems commonly enforce in-
tegrity constraints such as primary and foreign keys. The effect of these database
constraints is that parts of the corresponding ABox are not only consistent with
such constraints but are a model. This observation is in particular helpful in
the case of foreign keys since they stipulate that parts of the ABox completion
are no longer necessary: for a foreign key to hold in an instance of a relational
database, either its target exists in the appropriate table or the source is NULL,
which can then be interpreted as value unknown and is taken care of by query
rewriting ϕT .
Definition 15 (Adding Foreign Keys) Let A v B and A v ∀f.B be sub-
sumptions corresponding to foreign keys that capture ISA and attribute typ-
ing, respectively. For each such constraint, add PB (x) ← CA (x) and PB (x) ←
CA (y), Rf (x, y), respectively, to ΣA .
This definition allows us to sidestep parts of the ABox completion that are
mandated by the database constraints and thus already exist in the original
instance of the ABox. As a consequence, we have the following extension of
Theorem 14:

Theorem 16 Let K = (T , A) be a Horn-DLFD knowledge base. Then the data
complexity of conjunctive query answering is in AC 0 whenever the A completion
with respect to T is first-order definable with respect to the theory ΣT ∪ ΣA .

Example 17 Continuing with our example, adding the constraints

                        PEMP (x) ← CEMP (y), RmanRef (x, y)
                        PEMP (x) ← CDEPT (y), RdeptRef (x, y)
                        PDEPT (x) ← CEMP (y), RheadRef (x, y)

to ΣA , both CEMP (x) and CDEPT (x) become definable with the expected inter-
polants CEMP (x) and CDEPT (x) found by the above-mentioned interpolant genera-
tor [16,31]. Moreover, CMAN (x) also becomes definable by ∃y.PEMP (x)∧RmanRef (y, x)
(again, using the interpolant found by the interpolant generator).


5   Query-specific Rewritability
Our approach also provides decidability for the non-uniform (query-specific)
problems:

Theorem 18 Let T be a Horn-DLFD TBox and ϕ a (U)QC. Then the follow-
ing are equivalent:
1. ΣT ∪ ΣT∗ |= ∀x.ϕT → ϕ∗T (x), and
2. ϕ is first-order rewritable with respect to T .
The exact complexity again depends on the complexity of (1) above. In the
general case, an EXPTIME bound follows from [15], but again, a more refined
analysis is in order for fragments of Horn-DLFD.


6   Summary and Conclusions
We have shown how the combined combined approach to OMQ can be used
to determine when query answering mediated by a Horn-DLFD TBox can be
reduced to a first order query over a plain ABox (or, in turn, over the underlying
data source), and when a non-first order (Datalog-based) completion of the data
is needed. Interestingly enough, and unlike other approaches that aim on deter-
mining first-order rewritability, our approach links the problem to well studied
issues of explicit definability [3], interpolation [12], and to Clark’s completion of
logic programs [11]. In addition, our approach naturally and seamlessly incorpo-
rates database constraints that hold in the data source to improve our chances
of finding a first order rewriting.
    Further extensions of this work along the following lines is needed:

 – Performing a fine-grained analysis of computational complexity for tractable
   members of Horn-DLFD, such as CF Dnc . (We conjecture that the rewritabil-
   ity problem will remain intractable.)
 – Determining if a dichotomy holds, in particular, if non-rewritability implies
   NL hardness of OMQ in data complexity. (We conjecture that this is the
   case since blocked branches of a tableau proof signal non-rewritability and
   can be used to simulate, e.g., s-t connectivity similarly to the approach in
   [21].)
 – Extensions to situations in which ABox closure requires a finite number
   of additional auxiliary constants. (The approach itself can immediately use
   Skolem functions to deal with such constants while retaining completeness.
   However, decidability and complexity issues stemming from the use of Skolem
   functions remain to be determined.)
 – Extensions to DLs outside of the FunDL family, such as the various members
   of the EL family and to more expressive logics such as Horn-SHIQ.
 – And integration with an interpolation-based relational query optimizer [16,31]
   via depth-limited tableau construction, which, for a sufficiently large depth,
   leads to completeness as in [10].

An additional issue arises from the assumption of UNA. Indeed, this assumption
is implicit in [21], where explicit equality among constant symbols is simply not
considered, and also in most of the other approaches to OMQ via query rewriting,
including the original approach introduced in [9]. In the case of Horn-DLFD,
UNA is needed explicitly to accommodate keys and PFDs. We conjecture that
some form of UNA is necessary, but whether restricted variants could suffice is
unknown.


6.1   Related Work

First order rewritability for Horn logics in the ALC/SHIQ family has been
studied by others, e.g., see [4,6]. This earlier work has also developed algorithms
for generating such rewritings efficiently for logics in the EL family [14]. Our
approach seems to provide an alternative path to detecting rewritability and to
generating rewritings. A feature of our approach is its link to interpolation-based
query optimization [16,29]. The link to query optimization reveals that minimal
sized rewritings are often not optimal for query execution. However, establishing
limits on the size of rewriting [5] does provide a guide to what rewritings are
reasonable to consider during query optimization.
    The use of database constraints, or constraints implied by data mapping
rules and database constraints, has been explored in several systems that im-
plement variants of perfect rewriting [9], such as Ontop and MASTRO [2,8] and
others. Our approach, however, seamlessly accommodates such constraints into
the rewriting via interpolation and, again, could serve as an unifying approach
between perfect rewriting-based approaches and the combined approach, even
for DL-Lite based approaches to OMQ/OBDA.
    Our approach cannot deal with rewritings to Datalog/and or Linear Datalog
and with dichotomies on the NL-PTIME [21] and PTIME-coNP boundaries
[24]: extensions of definability/interpolation beyond first order theories do not
seem to exist at present and it is unclear whether similar approaches can be
developed due to the close ties between interpolation and compactness of first
order theories.
    Finally, note that Beth definability and Craig Interpolation have been used
for other purposes, such as query reformulation under first-order constraints
[7,16,29,31]. That use, however, is orthogonal to the topic of this paper.


References
 1. Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Pe-
    ter F. Patel-Schneider. The Description Logic Handbook: Theory, Implementation,
    and Applications. Cambridge University Press, 2003.
 2. Timea Bagosi, Diego Calvanese, Josef Hardi, Sarah Komla-Ebri, Davide Lanti,
    Martin Rezk, Mariano Rodriguez-Muro, Mindaugas Slusnys, and Guohui Xiao.
    The ontop framework for ontology based data access. In The Semantic Web and
    Web Science - 8th Chinese Conference, CSWS 2014, Revised Selected Papers, pages
    67–77, 2014.
 3. Evert Willem Beth. On Padoa’s method in the theory of definition. Indagationes
    Mathematicae, 15:330–339, 1953.
 4. Meghyn Bienvenu, Peter Hansen, Carsten Lutz, and Frank Wolter. First order-
    rewritability and containment of conjunctive queries in horn description logics. In
    Proceedings of the 29th International Workshop on Description Logics, Cape Town,
    South Africa, 2016.
 5. Meghyn Bienvenu, Stanislav Kikot, Roman Kontchakov, Vladimir V. Podolskii,
    Vladislav Ryzhikov, and Michael Zakharyaschev. The complexity of ontology-
    based data access with OWL 2 QL and bounded treewidth queries. In Proceedings
    of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
    Systems, PODS 2017, pages 201–216, 2017.
 6. Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology-
    based data access: A study through disjunctive datalog, csp, and MMSNP. ACM
    Trans. Database Syst., 39(4):33:1–33:44, 2014.
 7. Alexander Borgida, Jos de Bruijn, Enrico Franconi, Inanç Seylan, Umberto Strac-
    cia, David Toman, and Grant E. Weddell. On finding query rewritings under
    expressive constraints. In SEBD, pages 426–437, 2010.
 8. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    Antonella Poggi, Mariano Rodriguez-Muro, Riccardo Rosati, Marco Ruzzi, and
    Domenico Fabio Savo. The MASTRO system for ontology-based data access. Se-
    mantic Web, 2(1):43–53, 2011.
 9. Diego Calvanese, Giuseppe de Giacomo, Domenico Lembo, Maurizio Lenzerini,
    and Riccardo Rosati. Tractable Reasoning and Efficient Query Answering in De-
    scription Logics: The DL-Lite Family. J. of Automated Reasoning, 39(3):385–429,
    2007.
10. Jan Chomicki. Depth-bounded bottom-up evaluation of logic program. J. Log.
    Program., 25(1):1–31, 1995.
11. Keith L. Clark. Negation as failure. In Logic and Data Bases, Symposium on Logic
    and Data Bases, Centre d’études et de recherches de Toulouse, France, 1977, pages
    293–322, 1977.
12. William Craig. Three uses of the Herbrand-Genzen theorem in relating model
    theory and proof theory. Journal of Symbolic Logic, 22:269–285, 1957.
13. Francesco M. Donini and Fabio Massacci. EXPTIME tableaux for ALC. Artif.
    Intell., 124(1):87–138, 2000.
14. Peter Hansen, Carsten Lutz, Inanç Seylan, and Frank Wolter. Efficient query
    rewriting in the description logic EL and beyond. In Proceedings of the Twenty-
    Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, pages
    3034–3040, 2015.
15. Ian Horrocks, Ulrike Sattler, Sergio Tessaris, and Stephan Tobies. How to decide
    query containment under constraints using a description logic. In Logic for Pro-
    gramming and Automated Reasoning, 7th International Conference, LPAR 2000,
    Reunion Island, France, November 11-12, 2000, Proceedings, pages 326–343, 2000.
16. Alexander K. Hudek, David Toman, and Grant E. Weddell. On enumerating query
    plans using analytic tableau. In Automated Reasoning with Analytic Tableaux and
    Related Methods - 24th International Conference, TABLEAUX 2015, Wroclaw,
    Poland, September 21-24, 2015. Proceedings, pages 339–354, 2015.
17. Vitaliy L. Khizder, David Toman, and Grant Weddell. Reasoning about Duplicate
    Elimination with Description Logic. In DOOD, pages 1017–1032, 2000.
18. Stanislav Kikot, Roman Kontchakov, Vladimir V. Podolskii, and Michael Za-
    kharyaschev. Long rewritings, short rewritings. In Proceedings of the 2012 In-
    ternational Workshop on Description Logics, DL-2012, Rome, Italy, June 7-10,
    2012, 2012.
19. Roman Kontchakov, Carsten Lutz, David Toman, Frank Wolter, and Michael Za-
    kharyaschev. The combined approach to query answering in DL-Lite. In Proc.
    KR, pages 247–257, 2010.
20. Roman Kontchakov, Carsten Lutz, David Toman, Frank Wolter, and Michael Za-
    kharyaschev. The combined approach to ontology-based data access. In Proc. Int.
    Joint Conf. on Artificial Intelligence (IJCAI), pages 2656–2661, 2011.
21. Carsten Lutz and Leif Sabellek. Ontology-mediated querying with the description
    logic EL: trichotomy and linear datalog rewritability. In Proceedings of the Twenty-
    Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, pages
    1181–1187, 2017.
22. Carsten Lutz, Inanç Seylan, David Toman, and Frank Wolter. The combined
    approach to OBDA: Taming role hierarchies using filters. In ISWC (1), pages
    314–330, 2013.
23. Carsten Lutz, David Toman, and Frank Wolter. Conjunctive query answering in
    the description logic EL using a relational database system. In Proc. Int. Joint
    Conf. on Artificial Intelligence (IJCAI), pages 2070–2075, 2009.
24. Carsten Lutz and Frank Wolter. Non-uniform data complexity of query answering
    in description logics. In Principles of Knowledge Representation and Reasoning:
    Proceedings of the Thirteenth International Conference, KR 2012, 2012.
25. Stephanie McIntyre, Alexander Borgida, David Toman, and Grant E. Weddell. On
    limited conjunctions and partial features in parameter-tractable feature logics. In
    The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pages
    2995–3002, 2019.
26. Stephanie McIntyre, David Toman, and Grant E. Weddell. FunDL - A family
    of feature-based description logics, with applications in querying structured data
    sources. In Description Logic, Theory Combination, and All That - Essays Dedi-
    cated to Franz Baader on the Occasion of His 60th Birthday, pages 404–430, 2019.
27. Jason St. Jacques, David Toman, and Grant E. Weddell. Object-relational queries
    over CFDI nc knowledge bases: OBDA for the SQL-Literate. In Proc. IJCAI, pages
    1258–1264, 2016.
28. David Toman and Grant E. Weddell. On keys and functional dependencies as
    first-class citizens in description logics. J. Aut. Reasoning, 40(2-3):117–132, 2008.
29. David Toman and Grant E. Weddell. Fundamentals of Physical Design and Query
    Compilation. Synthesis Lectures on Data Management. Morgan & Claypool Pub-
    lishers, 2011.
30. David Toman and Grant E. Weddell. Conjunctive Query Answering in CFDnc :
    A PTIME Description Logic with Functional Constraints and Disjointness. In
    Australasian Conference on Artificial Intelligence, pages 350–361, 2013.
31. David Toman and Grant E. Weddell. An interpolation-based compiler and opti-
    mizer for relational queries (system design report). In IWIL@LPAR 2017 Workshop
    and LPAR-21 Short Presentations, Maun, Botswana, May 7-12, 2017, 2017.