FO-Rewritability of Expressive
                Ontology-Mediated Queries

                  Cristina Feier, Antti Kuusisto, Carsten Lutz

               Fachbereich Informatik, Universität Bremen, Germany


      Abstract. We show that FO-rewritability of OMQs based on (extensions
      of) ALC and unions of conjunctive queries (UCQs) is decidable and
      2NExpTime-complete. Previously, decidability was only known for atomic
      queries (AQs). On the way, we establish the same results also for monotone
      monadic SNP without inequality (MMSNP) and for monadic disjunctive
      Datalog (MDDLog). We also analyze the shape of FO-rewritings, thus
      making a step towards their actual computation.


1   Introduction

Query rewriting is a widely used technique for efficiently answering ontology-
mediated queries (OMQs) using off-the-shelf database systems. In particular, if
an OMQ is rewritable into a first-order (FO) query, then it can be answered
using a relational database system. It is thus a fundamental problem to design
algorithms that compute an FO-rewriting of a given OMQ. For OMQs based
on expressive DLs such as ALC, finding complete such algorithms turns out to
be a technically very challenging problem; moreover, the desired rewritings are
not always guaranteed to exist. A natural first step is thus to find an algorithm
that decides the existence of a rewriting; in fact, any complete and terminating
algorithm for computing rewritings will implicitly also solve the decision problem
and one can expect to learn important lessons already from the latter case.
    An OMQ is a triple Q = (T , Σ, q) with T a TBox, Σ an ABox signature, and
q a query [5]. We use (L, Q) to denote the OMQ language that consists of all OMQs
where T is formulated in the DL L and q in the query language Q. It has been
shown in [5] that decidability results and tight NExpTime complexity bounds
for FO-rewritability in (ALC, AQ) and related OMQ languages can be obtained
by translating the input OMQ Q into a constraint satisfaction problem (CSP)
whose complement is equivalent to Q and then applying known algorithms that
decide FO-rewritability of the CSP [15]. When atomic queries (AQs) are replaced
with conjunctive queries (CQs) or unions thereof (UCQs), such equivalence-
preserving translation to CSP is no longer possible; instead and as also shown
in [5], one can translate Q into a formula ϕ of the strictly more expressive logical
generalization MMSNP of CSP such that ¬ϕ is equivalent to Q [11]. In contrast
to the CSP case, though, it was not known whether FO-rewritability of MMSNP
formulas is decidable. In this abstract, we show that this is the case and that
the complexity is 2NExpTime-complete, with the lower bounds coming from [7].
We then lift this result to FO-rewritability of OMQs formulated in any OMQ
language between (ALCI, UCQ) and (SHI, UCQ). Technically, our approach
consists in a reduction of the MMSNP case to the CSP case. We also analyze
the shape of the rewritings, which we hope will provide guidance for finding
algorithms that compute actual rewritings. One can prove analogous results
for rewritability into monadic Datalog in a very similar way. We do not report
about details here for lack of space. We are currently working on the unrestricted
Datalog case, which is more challenging.
   Related work. FO-rewritability was first studied in an OMQ context for
the inexpressive DL-Lite family of DLs [1, 9, 14, 17]. FO-rewritability in OMQ
languages based on more expressive Horn DLs has been investigated in [3, 4, 12].
Rewritability of Horn DLs into Datalog was considered in [10, 13, 18–20].


2    Results

Instead of working with MMSNP, we prefer monadic disjunctive Datalog, MDDLog.
It was observed in [5] that these have the same expressive power up to comple-
mentation and can be mutually translated in polynomial time. We refer to [5]
for full definitions and notation. An MDDLog program Π is Boolean if it only
returns yes/no answers. The diameter of Π is the maximum number of variables
occuring in a rule in Π. A generalized CSP is defined by a set of templates S
instead of a single template and asks for a homomorphism from the input I to
at least one template T ∈ S (a template is simply a finite relational structure).
With coCSP, we mean the complement of a (potentially generalized) CSP. The
girth of a structure is the length of a smallest cycle in it and ∞ if there is no
cycle, see e.g. [7] for details.
    We start with Boolean MDDLog programs. Our proofs make use of the
classical translation of MMSNP sentences into generalized CSPs first described
by Feder and Vardi and stated in the following in terms of MDDLog.
Theorem 1 ([11]). Given a Boolean MDDLog program Π over EDB schema
SE of diameter k, one can effectively construct a set of templates SΠ over a
different EDB schema S0E such that

 1. every finite SE -structure I can be converted in polytime into an S0E -structure
    I 0 such that I |= Π iff I 0 ∈
                                 / CSP(SΠ );
 2. every finite S0E -structure I 0 of girth exceeding k can be converted in polytime
    into an SE -instance I such that I |= Π iff I 0 ∈  / CSP(SΠ ).

Exact size bounds on the set SΠ and its elements are given in [7]; although they
are needed for our final result, we omit them from this abstract for brevity.
   We next observe that the reduction described in Theorem 1 preserves FO-
rewritability, up to a certain gap related to inputs of small girth. In the following,
we will consider UCQ-rewritability instead of FO-rewritability since for MDDLog,
coCSP, and all other formalisms considered here FO-rewritability implies UCQ-
rewritability.
Proposition 1. Let Π be a Boolean MDDLog program of diameter k and let
SΠ be as in Theorem 1. Then
 1. every UCQ-rewriting of coCSP(SΠ ) can be converted into a UCQ-rewriting
    of Π in polynomial time;
 2. every UCQ-rewriting of Π can effectively be converted into a UCQ-rewriting
    of coCSP(SΠ ) on structures of girth exceeding k.
Thus, FO-rewritability of Π is equivalent to FO-rewritability of coCSP(SΠ ) on
structures of sufficiently high girth. The ‘high girth’ qualification is removed by
the following observation.
Lemma 1. Let S be a set of templates over schema SE and g ≥ 0. Then, if
coCSP(S) is UCQ-definable on finite SE -structures of girth exceeding g, it is
UCQ-definable on (unrestricted) finite SE -structures.
Analyzing the involved blowups and exploiting that FO-rewritability of generalized
CSPs can be decided in NP [5,15] and that FO-rewritability of Boolean MDDLog
programs is 2NExpTime-hard [7], we obtain the following.
Theorem 2. FO-rewritability of Boolean MDDLog programs is decidable in
2 NExpTime, thus 2 NExpTime-complete.
The limitation to Boolean programs can be lifted by a simple reduction which
replaces answer variables with fresh monadic relation symbols. Moreover, it
was shown in [7] that OMQs formulated in (SHI, UCQ) can be translated into
MDDLog with a blowup that is double exponential, but does not add up with
the blowup caused by the translation of MDDLog programs to generalized CSPs.
Finally, [7] also establishes a 2NExpTime lower bound for FO-rewritability in
(ALCI, CQ) and (ALC, UCQ). We can thus extend Theorem 2 as follows.
Theorem 3. FO-rewritability is 2 NExpTime-complete in MMSNP, in MDDLog,
and in OMQ languages between (ALC, UCQ) and (SHI, UCQ) as well as between
(ALCI, CQ) and (SHI, UCQ) (without transitive roles in UCQs).
We now consider the shape of rewritings. For brevity, we concentrate on Boolean
queries. An analysis of the proof of Proposition 1 yields the following.
Proposition 2. Every FO-rewritable Boolean MDDLog program of diameter k
has a rewriting of the form q1 ∨ · · · ∨ qn with each qi a CQ of treewidth bounded by
(1, k). The same holds for Boolean OMQs from the OMQ languages in Theorem 3.
See [6] for a definition of treewidth (1, k). We believe that Proposition 2 is
interesting for at least two reasons. Firstly, it says that when constructing
rewritings, it is enough to look for a structurally very restricted type of UCQ
instead of for an unrestricted FO formula. And secondly because it clarifies the
shape of obstructions of FO-rewritable MMSNP sentences in the spirit of CSP
obstructions [8]. In particular, it is interesting to contrast Proposition 2 with the
fact that if a CSP is FO-rewritable, then it has a finite set of obstructions that
are finite trees, that is, its complement is rewritable into a UCQ that consists of
tree-shaped CQs [2, 16].
Acknowledgements. We thank Manuel Bodirsky and Florent Madelaine for
inspiring discussions and the ERC for funding this work under grant 647289.
References
 1. Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Za-
    kharyaschev. The DL-Lite family and relations. J. of Art. Int. Res. (JAIR),
    36:1–69, 2009.
 2. Albert Atserias. On digraph coloring problems and treewidth duality. Eur. J.
    Comb., 29(4):796–820, 2008.
 3. Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. Deciding FO-rewritability in
    EL. In Proc. of DL, pages 70–80, 2012.
 4. Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. First-order rewritability of
    atomic queries in Horn description logics. In Proc. of IJCAI, 2013.
 5. Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology-
    based data access: A study through disjunctive Datalog, CSP, and MMSNP. ACM
    Trans. Database Syst., 39(4):33:1–33:44, 2014.
 6. Manuel Bodirsky and Vı́ctor Dalmau. Datalog and constraint satisfaction with
    infinite templates. J. Comput. Syst. Sci., 79(1):79–100, 2013.
 7. Pierre Bourhis and Carsten Lutz. Containment in monadic disjunctive Datalog,
    MMSNP, and expressive description logics. In Proc. of KR, 2016.
 8. A.A. Bulatov, A. Krokhin, and B. Larose. Dualities for constraint satisfaction
    problems. Complexity of Constraints, 5250 LNCS:93–124, 2008.
 9. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and
    Riccardo Rosati. Tractable reasoning and efficient query answering in description
    logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385–429, 2007.
10. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui
    Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of AAAI. 2012.
11. Tomás Feder and Moshe Y. Vardi. The computational structure of monotone
    monadic SNP and constraint satisfaction: A study through datalog and group
    theory. SIAM J. Comput., 28(1):57–104, 1998.
12. Peter Hansen, Carsten Lutz, İnanç Seylan, and Frank Wolter. Efficient query
    rewriting in the description logic el and beyond. In Proc. of IJCAI, 2015.
13. Mark Kaminski, Yavor Nenov, and Bernardo Cuenca Grau. Computing datalog
    rewritings for disjunctive datalog programs and description logic ontologies. In
    Proc. of RR, pages 76–91, 2014.
14. Stanislav Kikot, Roman Kontchakov, Vladimir V. Podolskii, and Michael Za-
    kharyaschev. Exponential lower bounds and separation for query rewriting. In
    Proc. of ICALP (2), pages 263–274, 2012.
15. Benoit Larose, Cynthia Loten, and Claude Tardif. A characterisation of first-order
    constraint satisfaction problems. Logical Methods in Comp. Sci., 3(4), 2007.
16. Jaroslav Nešetřil and Claude Tardif. Duality theorems for finite structures (charac-
    terising gaps and good characterisations). J. Comb. Theory, Ser. B, 80(1):80–97,
    2000.
17. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. A comparison of query
    rewriting techniques for DL-Lite. In Proc. of DL, 2009.
18. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Tractable query answering
    and rewriting under description logic constraints. J. of Applied Logic, 8(2):186–209,
    2010.
19. Riccardo Rosati. On conjunctive query answering in EL. In Proc. of DL, pages
    451–458, 2007.
20. Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos B. Stamou.
    Optimising resolution-based rewriting algorithms for OWL ontologies. J. Web Sem.,
    33:30–49, 2015.