Polynomial Disjunctive Datalog Rewritings of
          Instance Queries in Expressive
                Description Logics

          Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas Šimkus

               Institute of Information Systems, TU Vienna, Austria


      Abstract. Rewriting ontology mediated queries (OMQs) into traditional
      query languages like FO-queries and Datalog is central for understanding
      their relative expressiveness, and plays a crucial role in the ongoing efforts
      to develop OMQ answering tools by reusing existing database technologies.
      However, the vast majority of works focus on Horn ontologies, and for
      OMQs where the ontologies are written in extensions of ALC, only a
      couple of exponential rewritings into disjunctive Datalog are known.
      In this paper we propose a translation of instance queries mediated by
      ontologies in the expressive DL ALCHI, into polynomial-sized disjunctive
      Datalog programs. The translation is based on a simple game-like
      algorithm, and can be extended to accommodate nominals. We can
      also rewrite OMQs with closed-predicates into Datalog programs with
      (limited) negation. Closed predicates are useful for combining complete
      and incomplete information, but make OMQs non-monotonic and thus
      not rewritable into positive disjunctive Datalog.


1    Introduction

In ontology-mediated queries (OMQs), a database query is enriched with an
ontology, providing knowledge to obtain more complete answers from incomplete
data. OMQs are receiving much attention in the database and knowledge repre-
sentation research communities, particularly when the ontological knowledge is
expressed in Description Logics (DLs) or in rule-based formalisms like existential
rules and Datalog±, see e.g., [4, 3, 13] and their references. One of the most
central questions in OMQ research is rewritability: given an OMQ Q, specified
by a query and an ontology, obtain a query Q0 —in some standard database
language, like first-order (FO) queries or (fragments of) Datalog—such that,
for any dataset A, the certain answers to Q and Q0 coincide. Such a Q0 is called
a rewriting of Q. Its existence and size are crucial for understanding the relative
expressiveness of different families of OMQs in terms of more traditional query
languages. Rewritings are also very relevant in practice, since they allow to reuse
existing database technologies to support OMQ answering.
    Research into OMQs that can be rewritten into first-order (FO) queries has
produced the successful DL-Lite family [5] of DLs. The succinctness of rewritings
for OMQs consisting of the so-called (unions of ) conjunctive queries (UCQs)
paired with ontologies in DL-Lite, or in related families of existential rules, has
been extensively studied. It has been shown that polynomially sized non-recursive
Datalog queries can be constructed for a few well known ontology languages,
formulated as DLs or existential rules [14]; but for many other cases, the lack of
succinctness of such rewritings has also been established [11], and some authors
have considered rewritings that, unlike the ones we consider here, are not data-
independent [18, 12]. For DLs beyond DL-Lite, which are often not FO-rewritable,
rewritings of UCQs into Datalog queries have been proposed, and lie at the
core of implemented systems, e.g., [24, 9, 25]. The pioneering work in [15] showed
that instance queries in an expressive extension of ALC can be rewritten into
a program in disjunctive Datalog, using a constant number of variables per
rule, but exponentially many rules. The first translation from CQs in expressive
DLs (SH, SHQ) to disjunctive Datalog programs was introduced in [8], but
the program may contain double exponentially many predicates. For ALC and
UCQs, the existence of exponential rewritings in disjunctive Datalog was shown
recently [4]. For restricted fragments of SHI and classes of CQs translations to
Datalog were investigated in [16, 17]. A polynomial time Datalog translation
of instance queries has been proposed in [23], but for a so-called Horn-DL that
lacks disjunction. To our knowledge, this was until now the only polynomial
rewriting for a DL that is not FO-rewritable.
    The main contribution of this paper is a polynomial time translation of
instance queries mediated by ALCHI TBoxes into disjunctive Datalog. The
translation is based on a simple game-theoretic characterization, and implements
a so-called type-elimination procedure using a polynomially sized Datalog
program. To our knowledge, this is the first polynomial time translation of an
expressive (non-Horn) DL into disjunctive Datalog. The translation we present
here can be extended to richer OMQs, and in particular, to nominals and closed
predicates. In fact, we report here a simplified version of the translation in [1],
which is a polynomial translation of instance queries mediated by ALCHOI
TBoxes with closed predicates into disjunctive Datalog queries with negation.


2    Preliminaries

We give some basic notions of DLs and Datalog.
The DL ALCHI We assume countably infinite, mutually disjoint sets NR of
role names, NC of concept names, and NI of individual names. A role r is either a
role name p, or an expression p− , called the inverse of p. We let r− = p if r = p− .
Concepts are defined inductively: (i) >, ⊥ and all A ∈ NC are concepts; (ii) if
C1 , C2 are concepts, then C1 u C2 , C1 t C2 and ¬C1 are concepts; (iii) if r is a
role, and C is a concept, then ∃r.C, ∀r.C are concepts. A concept inclusion is
an expression of form C1 v C2 , where C1 , C2 are concepts. A role inclusion is
an expression of form r1 v r2 , where r1 , r2 are roles. A TBox T is a finite set of
(concept and role) inclusions. An ABox A is a finite set of assertions of the forms
A(a) and p(a, b), where {a, b} ⊆ NI , A ∈ NC , and p ∈ NR . A knowledge base (KB)
is a tuple K = (T , A), where T is a TBox and A is an ABox.
     An interpretation is a pair I = h∆I , ·I i, where ∆I is a non-empty set (called
the domain), AI ⊆ ∆I for each A ∈ NC , rI ⊆ ∆I × ∆I for each r ∈ NR , and
aI ∈ ∆I for each a ∈ NI . The function ·I is extended to the remaining concepts
and roles in the standard way [2]. An interpretation I satisfies an inclusion
q1 v q2 , if q1I ⊆ q2I , in symbols I |= q1 v q2 ; and it satisfies an assertion q(a) if
(a)I ∈ q I , in symbols, I |= q(a). For Γ a TBox or ABox, we write I |= Γ if
I |= α for all α ∈ Γ . For a KB K = (T , A), we write I |= K if the following
hold: (i) a ∈ ∆I and aI = a for each a ∈ NI occurring in K,1 (ii) I |= T , and
(iii) I |= A. For an assertion α, we write K |= α if I |= α for all I with I |= K.
Instance Queries An (ontology mediated) instance query is a tuple Q = (T , q),
where T is a TBox, and q ∈ NC ∪ NR . Let a ∈ NI in case q ∈ NC , and a ∈ N2I
otherwise. Then a is a certain answer to Q over an ABox A if (T , A) |= q(a). To
ease presentation, we assume that every concept and role of A also occurs in T .
Normal Form Our results apply to arbitrary TBoxes, but to simplify pre-
sentation, we consider TBoxes in normal form where inclusions take one of the
following forms:

          (N1) A1 u · · · u An v An+1 t · · · t Ak                (N3) A v ∀r.A0
          (N2) A v ∃r.A0                                          (N4) r v s

with r, s roles, {A1 , . . . , Ak } ⊆ NC , and {A, A0 } ⊆ NC . We also assume that T is
closed under role inclusions as follows: (a) p v p ∈ T , for each p ∈ NR occurring
in T , (b) if r v s ∈ T , then r− v s− ∈ T , and (c) if r1 v r2 ∈ T and r2 v r3 ∈ T ,
then r1 v r3 ∈ T . For Γ a TBox, ABox, or KB, we denote by NI (Γ ), NR (Γ ),
NC (Γ ) the set of individuals, role and concept names that occur in Γ , resp.
Disjunctive Datalog We assume countably infinite sets NP and NV of predicate
symbols (each with an associated arity) and variables, respectively. We further
assume that NC ∪ NR ⊆ NP with each A ∈ NC being unary, and each r ∈ NR being
binary. An atom is an expression of the form R(t1 , . . . , tn ), where {t1 , . . . , tn } ⊆
NI ∪ NV , and R is an n-ary relation symbol. A rule ρ is an expression of the
form h1 ∨ . . . ∨ hn ← b1 , . . . , bk , where H = {h1 , . . . , hn } and B = {b1 , . . . , bk }
are sets of atoms, called the head and the body of ρ, respectively. Each variable
that occurs in H must also occur in B. Rules of the form h ← (known as facts)
are simply identified with the atom h, thus ABox assertions are valid facts
in our syntax. For a role name p, we may use p− (t1 , t2 ) to denote the atom
p(t2 , t1 ). A program is any finite set P of rules. We use ground(P ) to denote the
grounding of P , i.e. the variable-free program that is obtained from P by applying
on its rules all the possible substitutions of variables by individuals of P . An
(Herbrand) interpretation (or, database) I is any finite set of variable-free (or,
ground ) atoms. An interpretation I is a model of a program P if {b1 , . . . , bk } ⊆ I
implies I ∩ {h1 , . . . , hn } =
                               6 ∅ for all rules h1 ∨ . . . ∨ hn ← b1 , . . . , bk in ground(P ).
A ( Datalog) query is a pair (P, q), where P is a program, and q is a predicate
1
    This is called the standard name assumption (SNA), which we apply to the individuals
    occurring in K.
symbol from P . A tuple a of constants is a certain answer to (P, q) over a
database I if q(a) ∈ J for all models J of P ∪ I.


3    Characterization of Counter Models
Assume a KB K = (T , A) and an assertion β. Towards deciding K 6|= β using a
polynomially sized program, we decompose the problem into two steps:
(1) Guess a core interpretation Ic for K, whose domain is NI (A). Core interpreta-
    tions fix how the individuals of K participate in concepts and roles, ensuring
    the satisfaction of A, and the non-entailment of β.
(2) Check that Ic can be extended to satisfy all axioms in T .
Defining Datalog rules that do (1) is not hard, but (2) is more challenging, and
will rely on a game-theoretic characterization we describe below. But first we
need to define core interpretations.

Definition 1. A core interpretation for a KB K = (T , A) is any interpretation
Ic such that
(c1) ∆Ic = NI (A) and aIc = a for all a ∈ NI (A),
(c2) Ic |= A,
(c3) Ic |= A v ∀r.A0 for all A v ∀r.A0 ∈ T ,
(c4) Ic |= r v s for all r v s ∈ T ,
(c5) q Ic = ∅ for each q 6∈ NC (T ) ∪ NR (T ).
An interpretation J is called an extension of Ic , if Ic is the result of restricting
J to NI (A).

    A core and its extensions coincide on the assertions they entail, and deciding
non-entailment of an instance query q amounts to deciding whether there is a
core that does not entail q, and that can be extended into a model. But verifying
whether a core can be extended into a full model is hard: it corresponds to testing
consistency (of Ic viewed as an ABox) with respect to T , an ExpTime-hard
problem already for fragments of ALCHI. In order to obtain a polynomial set
of rules that solves this ExpTime-hard problem, we characterize it as a game,
revealing a simple algorithm that admits an elegant implementation in Datalog.

Definition 2. A type (over a TBox T ) τ is a subset of NC (T ) ∪ {>} such that
⊥ 6∈ τ and > ∈ τ . We say that τ satisfies an inclusion α = A1 u · · · u An v An+1 t
· · · t Ak of type (N1), if {A1 , . . . , An } ⊆ τ implies {An+1 , . . . , Ak } ∩ τ =
                                                                                    6 ∅;
otherwise τ violates α. For an element e ∈ ∆I in an interpretation I, we let
type(e, I) = {B ∈ NC | e ∈ B I }. A type τ is realized in I if there is some e ∈ ∆I
s.t. type(e, I) = τ .

    We now describe a game to decide whether a given core Ic can be extended
into a model of a KB K. The game is played by Bob (the builder), who wants
to extend Ic into a model, and Sam (the spoiler), who wants to spoil all Bob’s
attempts. Sam starts by picking an individual a, and they look at its type
type(a, Ic ). If it doesn’t satisfy the inclusions of type (N1) Sam wins. Otherwise,
in each turn Sam chooses an inclusion of the form A v ∃rA0 which would need to
be satisfied by (an element with) the current type, forcing Bob to pick a type for
the corresponding r-successor that satisfies A0 . The game continues for as long
as Bob can respond to the challenges of Sam.
    The game on I starts by Sam choosing an individual a ∈ ∆I , and τ =
type(a, I) is set to be the current type. Then:
() If τ does not satisfy all inclusions of type (N1) in T , then Sam is declared
    the winner. Otherwise, Sam chooses an inclusion A v ∃r.A0 ∈ T with A ∈ τ ;
    if there is no such inclusion, Bob is declared the winner. Otherwise, Bob
    chooses a new type τ 0 such that:
  (C1) A0 ∈ τ 0 , and
  (C2) for all inclusions A1 v ∀s.A2 ∈ T :
           • if r v s ∈ T and A1 ∈ τ then A2 ∈ τ 0 ,
           • if r− v s ∈ T and A1 ∈ τ 0 then A2 ∈ τ .
      0
    τ is set to be the current type, and the game continues with a new round,
    i.e. we go back to .
   A run of the game on I is a (possibly infinite) sequence
                                     aα1 τ1 α2 τ2 . . .
where a is the individual picked initially by Sam, and each αi and τi are the
inclusion picked by Sam and the type picked by Bob in round i, respectively. A
strategy for Bob is a partial function str that maps pairs of a type τ and an inclu-
sion A v ∃r.A0 with A ∈ τ to a type τ 0 that satisfies (C1) and (C2); intuitively,
it gives a move for Bob in response to moves of Sam. A run aα1 τ1 α2 τ2 . . . with
type(a, I) = τ0 follows a strategy str if τi = str(τi−1 , αi ) for every i ≥ 1.
    For a finite run w, we let tail(w) = type(a, I) if w = a, and tail(w) = τ` if
w = a . . . α` τ` with ` ≥ 1. The strategy str is called non-losing on I if for every
finite run w that follows str, tail(w) satisfies all inclusions of type (N1) in T ,
and str(tail(w), A v ∃r.A0 ) is defined for every A v ∃r.A0 ∈ T with A ∈ tail(w).
Theorem 1. Assume a KB K and an assertion β. Then K 6|= β iff there is a
core interpretation Ic for K such that:
(1) Ic 6|= β, and
(2) there is a non-losing strategy for Bob on Ic .
Proof. (Sketch.) We focus on showing that there is a non-losing strategy str for
Bob on Ic iff there exists an extension J of Ic s.t. J |= K. The claim follows
from this, and the easy claim that extensions preserve non-entailment of β.
    For the “⇒” direction, from an arbitrary non-losing str for Ic , we build J as
follows. We denote by frn(Ic , str) the set of all finite runs aα1 τ1 . . . αi τi , i ≥ 0
that follow str. The domain of J is:
                                  ∆J = frn(Ic , str)
and for each a ∈ NI , each A ∈ NC , and each p ∈ NR we let:

                  aJ = aIc                 AJ = {w | A ∈ tail(w)}
                  pJ = pIc ∪ {(w, wαi τi ) | ri v p ∈ T } ∪
                               {(wαi τi , w) | ri− v p ∈ T }

where αi = A v ∃ri .A0 ∈ T and {w, wαi τi } ⊆ ∆J .
     It is not hard to see that J is an extension of Ic . We show that J is a model
of K. First, J |= A by definition of Ic and the fact that J is an extension of
Ic . It is left to prove that J satisfies all inclusions in T . Note that for each
w ∈ ∆J , type(w, J ) = tail(w). That J satisfies all inclusions of type (N1) holds
by definition of non-losing str and the fact that tail(w) satisfies all such inclusions.
For the inclusions α = A v ∃r.A0 of type (N2), we see that for each w ∈ ∆J ,
if w ∈ AJ , then A ∈ tail(w) and w is a run that follows str. Hence, since str is
a non-losing strategy str(tail(w), α) is defined, that is there exists a τ 0 over T
such that str(tail(w), α) = τ 0 and A0 ∈ τ 0 . Thus wατ 0 is a run that follows str,
i.e. wατ 0 ∈ frn(Ic , str), so wατ 0 is in A0J and (w, wατ 0 ) ∈ rJ if r is a role name,
and (wατ 0 , w) ∈ rJ otherwise. For the inclusions A1 v ∀r.A2 of type (N3), we
distinguish the case of (a, a0 ) ∈ rIc for a pair a, a0 of individuals in ∆Ic , for which
a ∈ AJ              0      J
        1 implies a ∈ A2 holds by definition of Ic , and the case of an arbitrary
pair of objects (w, w ) ∈ rIc where w0 is a child of w (that is, w0 is of the form
                         0

wαi τi ), or vice-versa, for which w0 ∈ AJ                         J
                                              2 whenever w ∈ A1 is ensured by the
condition (C2) in the definition of the game, the fact that str is a non-losing
strategy, and the construction of J . Finally, the inclusions of type (N4) are also
guaranteed by the definition of the core Ic for pairs of individual names, and by
the condition (C2) in the definition of the game for arbitrary pairs of an object
and its child.
     For “⇐”, assume an arbitrary model J of K that is an extension of Ic . We can
extract from it a non-losing strategy for Bob as follows. First, let T be a the set of
all the types realized in J , and observe that each type in T satisfies all inclusions
of type (N1) and that for all a ∈ NI (A), type(a, J ) ∈ T . Then it suffices to set,
for each τ ∈ T and each A v ∃r.A0 ∈ T with A ∈ τ , str(τ, A v ∃r.A0 ) = τ 0 for an
arbitrarily chosen τ 0 ∈ T that satisfies (C1) and (C2), which exists because J is a
model, hence it satisfies all existential and universal inclusions in T , and all types
realized in J are contained in T . This str is a non-losing strategy for Bob. t         u

    To decide whether Bob has a non-losing strategy on a given core we use the
type elimination procedure Mark in Algorithm 1, which marks (or, eliminates)
all types from which Sam has a winning strategy. It takes as input the TBox T ,
and an interpretation I which intuitively is the core being checked. The algorithm
starts by building the set N of all possible types over T , and then it eliminates
bad choices (from which Bob would loose) by marking them. In step (MN1 ), the
algorithm marks in N all types that violate some inclusion of type (N1) in T ;
Sam wins already in the first round on these types. Then, in the loop, (M∃ )
exhaustively marks types τ that allow Sam to pick an inclusion A v ∃r.A0 for
    Algorithm 1: Mark
     input : TBox T , interpretation I
     output : Set of (possibly marked) types
     N ← {τ | τ is a type over T }
     (MN1 ) Mark each τ ∈ N that violates some inclusion of the form (N1) in T
     repeat
        (M∃ ) Mark each τ ∈ N such that A v ∃r.A0 ∈ T , A ∈ τ , and for each τ 0 ∈ N ,
        at least one the following holds:
          (C0)     τ 0 is marked,
              0
          (C1 )    A0 ∈
                      / τ 0 , or
          (C20 )   there exists A1 v ∀s.A2 ∈ T with
                                         / τ 0 , or
           – r v s ∈ T and A1 ∈ τ and A2 ∈
              −                    0
           – r v s ∈ T and A1 ∈ τ and A2 ∈ /τ

     until no new type is marked
     return N

which Bob cannot reply with any τ 0 . At the end the marked types, are those
from which Bob can lose, so we get:
Theorem 2. Let Ic be a core interpretation. Then Bob has a non-losing strategy
on Ic iff none of the types realized in Ic is marked by Mark(T , Ic ).
Proof. (Sketch.) For the “⇒” direction, we can show (by induction in the number
of iterations of Mark(T , Ic )) that if a type is marked, then it cannot occur in a
non-losing str for Bob. For the “⇐” direction, a non-losing str for Ic is obtained
by taking all unmarked τ ∈ N , and for each of them, and each A v ∃r.A0 ∈ T
with A ∈ τ , setting str(τ, A v ∃r.A0 ) = τ 0 for an arbitrary unmarked τ 0 that
satisfies (C1) and (C2).                                                         t
                                                                                 u

4     Translation into Datalog
Assume an instance query (T , q). We build next a polynomially sized program
P such that the queries (T , q) and (P, q) have the same certain answers for all
ABoxes over the signature of T . Roughly, the program P has 3 major components:
(a) rules to non-deterministically generate a core interpretation Ic for the KB
(T , A), where A is an input ABox; (b) rules that implement the type elimination
algorithm presented in the previous section; (c) rules that glue (a) and (b) together,
ensuring that all types that occur in Ic are not marked by the marking procedure.
We remark that the construction of P is independent from any particular ABox.
(I) Collecting the individuals We first add rules to collect in the unary
predicate ind all the individuals that occur in the input ABox. For each A ∈ NC (T ),
r ∈ NR (T ), we have:
          ind(x) ← A(x)              ind(x) ← r(x, y)        ind(y) ← r(x, y)
(II) Generating core interpretations For each A ∈ NC (T ) (resp., r ∈
NR (T )) we will use a fresh concept name A (resp., role name r). We add the
following rules to P :
                A(x) ∨ A(x) ← ind(x)                               A ∈ NC (T )
                                  ← A(x), A(x)                     A ∈ NC (T )
             r(x, y) ∨ r(x, y) ← ind(x), ind(y)                    r ∈ NR (T )
                               ← r(x, y), r(x, y)                  r ∈ NR (T )
To ensure (c3) in Definition 1, for each A v ∀r.A0 ∈ T we add the rule
A0 (y) ← A(x), r(x, y). Then, to ensure (c4), for each role inclusion r v s ∈ T
we add the rule s(x, y) ← r(x, y).
    Intuitively, the stable models of the above rules generate the different core
interpretations Ic of the KB K = (T , A) for any given A.
    We next implement the algorithm Mark from Section 3. To obtain a poly-
nomially sized program, we need to use non-ground rules whose number of
variables depends on the number of different concept names in T . Assume an
arbitrary enumeration A1 , . . . , Ak of NC (T ). Assume also a pair 0, 1 of special
individuals. Intuitively, we will use a k-ary relation Type = {0, 1}k to store the
set of all types over T . Naturally, a k-tuple (a1 , . . . , ak ) ∈ Type encodes the type
τ = {Ai | ai = 1, 1 ≤ i ≤ k} ∪ {>}. We are most interested in computing a
k-ary relation Marked ⊆ {0, 1}k that contains precisely the types marked by the
Mark algorithm. We next define the rules to compute Type and Marked, and
other relevant relations.
(III) A linear order over types The first ingredient is a linear order over
all possible types. To this end, for every 1 ≤ i ≤ k, we inductively define i-ary
relations firsti and lasti , and a 2i-ary relation nexti , which will provide the first,
the last and the successor elements from a linear order on {0, 1}i . In particular,
given u, v ∈ {0, 1}i , the fact nexti (u, v) will be true if v follows u in the ordering
of {0, 1}i . The rules to populate nexti are quite standard (see, e.g., Theorem 4.5
in [6]). For the case i = 1, we simply add the following facts:
                 first1 (0) ←          last1 (1) ←           next1 (0, 1) ←
Then, for all 1 ≤ i ≤ k − 1 we add the following rules:
                       nexti+1 (0, x, 0, y) ← nexti (x, y)
                       nexti+1 (1, x, 1, y) ← nexti (x, y)
                       nexti+1 (0, x, 1, y) ← lasti (x), firsti (y)
                            firsti+1 (0, x) ← firsti (x)
                                lasti+1 (1, x) ← lasti (x)
We can now collect in the k-ary relation Type all types over T (thus computing
the set N of the Mark algorithm):
                  Type(x) ← firstk (x)         Type(y) ← nextk (x, y)
(IV) Implementing Step (MN1 ) First, we add the auxiliary facts F(0) ← and
T(1) ← to P . For a k-tuple of variables x, we let B ∈ x denote the atom T (xj ),
where j is the index of B in the enumeration of NC (T ). Similarly, we let B 6∈ x
denote the atom F (xj ), where j is the index of B in the enumeration. Then the
step (MN1 ), which marks types violating inclusions of type (N1), is implemented
using the following rule for every inclusion A1 u · · · u An v A01 t · · · t A0m of T :

         Marked(x) ← Type(x), A1 ∈ x, . . . , An ∈ x, A01 6∈ x, . . . , A0m 6∈ x

(V) Implementing Step (M∃ ) The following rules are added for each inclu-
sion α = A v ∃r.A0 ∈ T . Recall that we need to mark a type τ if A ∈ τ , and for
each type τ 0 at least one of (C0), (C10 ) or (C20 ) holds. We first use an auxiliary
(2k + 1) relation MarkedOne to collect all such types τ 0 .
    Assume a fresh individual aα . For collecting each τ 0 that satisfies (C0), we
add:
                    MarkedOne(x, aα , y) ← Type(x), Marked(y)
    For the condition (C10 ), we add the rule:

                 MarkedOne(x, aα , y) ← Type(x), Type(y), A0 6∈ y

   The rules for (C20 ) are as follows. For all inclusions A1 v ∀r.A2 ∈ T with
r v s ∈ T , we add the rule

             MarkedOne(x, aα , y) ← Type(x), Type(y), A1 ∈ x, A2 6∈ y

For all A1 v ∀r.A2 ∈ T with r− v s ∈ T , we also add

             MarkedOne(x, aα , y) ← Type(x), Type(y), A1 ∈ y, A2 6∈ x

   We infer Marked(t) in case MarkedOne(t, aα , v) is true for all types (bit
vectors) v. To this end, we need another (2k + 1)-ary relation MarkedUntil. We
add:

            MarkedUntil(x, aα , z) ←MarkedOne(x, aα , z), firstk (z)
            MarkedUntil(x, aα , u) ←MarkedUntil(x, aα , z), nextk (z, u),
                                       MarkedOne(x, aα , u)

   Intuitively, with the above rules we traverse all types checking the conditions
(C0), (C10 ), (C20 ) described in (M∃ ). If we manage to reach the last type, we
know the condition is satisfied. We add the following rule:

                Marked(x) ← MarkedUntil(x, aα , z), A ∈ x, lastk (z)

(VI) Forbidding marked types in the core We need to forbid each indi-
vidual in the generated core interpretation from having a type from Marked. For
all 0 ≤ i ≤ k, we take a fresh (i + 1)-ary relation symbol Proji . We first add:

                          Projk (x, y) ← ind(x), Marked(y)
We will now project away bits from the Proji relations by looking at the actual
types of individuals. For all 1 ≤ i ≤ k we have the following rules:

                        Proji−1 (x, y) ← Proji (x, y, 1), Ai (x)
                        Proji−1 (x, y) ← Proji (x, y, 0), Ai (x)

Intuitively, Proji−1 (a, b1 , . . . , bi−1 ) says the partial type given by the bit values
b1 , . . . , bi−1 can be extended to a marked type by choosing additional concepts
according to the actual type of the individual a. Thus the fact Proj0 (a) represent
the situation where a has a marked type. Such situations are ruled out by adding
the constraint:
                                             ← Proj0 (x)
   This concludes the translation of the instance query (T , q), where T is an
ALCHI TBox, into the disjunctive Datalog program P . This construction,
together with Theorems 1 and 2, yields the main result of this paper.

Theorem 3. For an instance query (T , q), we can build in polynomial time a
Datalog query (P, q) such that the certain answers to (T , q) and (P, q) coincide
for any given ABox A over the signature of T .

    The above encoding employs disjunctive programs. Entailment of ground
atoms in such programs is coNExpTime-complete [7], which does not match the
ExpTime-completeness of satisfiability of ALCHI KBs. However, we employ
disjunction in a limited way, and thus our programs fall into a class of programs
that can be evaluated in (deterministic) exponential time. In particular, the
above program P can be partitioned into two programs P1 , P2 as follows: P1
consists of all rules in (I) and (II), and P2 consists of the remaining rules. Observe
that P1 is a disjunctive program with at most two variables in each rule, while
P2 is is disjunction-free. Note also that P2 does not define any relations used
in P1 , i.e. none of the relation symbols of P1 occurs in the head of a rule in P2 .
Assume a set F of facts over the signature of P1 . Due to the above properties,
the successful runs of the following non-deterministic procedure generate the set
of all minimal models of P ∪ F :

(S1) Compute a minimal model I1 of P1 ∪ F . If I1 does not exist due to a
     constraint violation, then return failure.
(S2) Compute the least model I2 of P2 ∪ I1 . Again, if I2 does not exist due to a
     constraint violation, then return failure. Otherwise, output I2 .

Since P1 has at most two-variables in every rule, each minimal model I1 of
P1 ∪ F is of polynomial size in the size of P1 ∪ F , and the set of all such models
can be traversed in polynomial space. For a given I1 , performing the step (S2)
is feasible in (deterministic) exponential time, because P2 is disjunction-free
and of polynomial size in the size of the original instance query. It follows that
computing the certain answers to (P, q) for any given ABox A over the signature
of T requires single exponential time.
5    Discussion
We have presented our results for ALCHI, but they also apply to SHI, its
extension with transitivity axioms; it is well known that instance queries mediated
by SHI TBoxes can be rewritten in polynomial time to instance queries mediated
by ALCHI TBoxes (see, e.g., [15]). Moreover, we have presented the translation
for instance queries, but the results can be easily generalized, e.g., to DL-safe
rules of [22], or the quantifier-free conjunctive queries like in [21]. These queries
are syntactically restricted to ensure that the relevant variable assignments only
map into individuals of the input ABox. We note that generalizing our translation
to arbitrary conjunctive queries while remaining polynomial is not possible under
common assumptions in complexity theory. Indeed, cautious inference from a
disjunctive program is in coNExpTime, but entailment of (Boolean) conjunctive
queries is 2ExpTime-hard already for the DL ALCI [19].
    As mentioned in the introduction, both the game theoretic characterization
and the Datalog translation can be extended to ontologies in ALCHOI, the
extension of ALCHI with nominals. This is possible even in the presence of
the so-called closed predicates, which are useful for combining complete and
incomplete knowledge, by explicitly specifying predicates assumed complete, thus
given a closed-world semantics [10, 20]. For example, take the following TBox T :
              BScStud v Student       Student v ∃attends.Course
              BScStud v ∀attends.¬GradCourse
and the ABox A:
              Course(c1 ), Course(c2 ), GradCourse(c2 ), BScStud(a)
Then (a, c1 ) is not a certain answer to (T , q) with q = attends(x, y). But if c1 and
c2 are known to be the only courses, then (a, c1 ) should be a certain answer. This
can be achieved by declaring Course a closed predicate: (a, c1 ) is indeed a certain
answer to the OMQ with closed predicates (T , Σ, q), where Σ = {Course}.
    Interestingly, OMQs with closed predicates are non-monotonic. Indeed, (a, c1 )
is a certain answer to (T , Σ, q) over A, but it is not a certain answer over
the extended set of facts A0 = A ∪ {Course(c3 )}. For this reason, these queries
cannot be rewritten into monotonic variants of Datalog, like positive Datalog
with disjunction as considered here. However, one can devise a polynomial time
translation into disjunctive Datalog with negation as failure, for OMQs of the
form (T , Σ, q) where T is an ALCHOI TBox, Σ is a set of closed predicates, and
q is an instance query. The translation uses the same ideas presented here, but
handling closed predicates (both in the game-theoretic characterization, and in the
Datalog translation) implies reasoning about the types that are already realized
in the core; in Datalog, this requires negation. Details can be found in [1].


Acknowledgements

This work was supported by the Austrian Science Fund (FWF): projects P25207-
N23, T515 and W1255-N23.
References

 1. S. Ahmetaj, M. Ortiz, and M. Šimkus. Polynomial datalog rewritings for expressive
    description logics with closed predicates. In Proc. of IJCAI’16, 2016. To appear.
 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, edi-
    tors. The Description Logic Handbook: Theory, Implementation, and Applications.
    Cambridge University Press, second edition, 2007.
 3. M. Bienvenu and M. Ortiz. Ontology-mediated query answering with data-tractable
    description logics. In Proc. of RW 2015, pages 218–307. Springer, 2015.
 4. M. Bienvenu, B. ten Cate, C. Lutz, and F. Wolter. Ontology-based data access:
    A study through disjunctive Datalog, CSP, and MMSNP. ACM Trans. Database
    Syst., 39(4):33:1–33:44, 2014.
 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. Autom. Reasoning, 39(3):385–429, 2007.
 6. E. Dantsin, T. Eiter, G. Gottlob, and A. Voronkov. Complexity and expressive
    power of logic programming. ACM Comput. Surv., 33(3):374–425, 2001.
 7. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive datalog. ACM Trans. Database
    Syst., 22(3):364–418, 1997.
 8. T. Eiter, M. Ortiz, and M. Šimkus. Conjunctive query answering in the description
    logic SH using knots. J. Comput. Syst. Sci., 78(1):47–85, 2012.
 9. T. Eiter, M. Ortiz, M. Šimkus, T. Tran, and G. Xiao. Query rewriting for Horn-SHIQ
    plus rules. In Proc. of AAAI 2012. AAAI Press, 2012.
10. E. Franconi, Y. A. Ibáñez-Garcı́a, and I. Seylan. Query answering with DBoxes is
    hard. Electr. Notes Theor. Comput. Sci., 278:71–84, 2011.
11. G. Gottlob, S. Kikot, R. Kontchakov, V. V. Podolskii, T. Schwentick, and M. Za-
    kharyaschev. The price of query rewriting in ontology-based data access. Artif.
    Intell., 213:42–59, 2014.
12. G. Gottlob, M. Manna, and A. Pieris. Polynomial combined rewritings for existential
    rules. In Proc. of KR 2014. AAAI Press, 2014.
13. G. Gottlob, M. Manna, and A. Pieris. Polynomial rewritings for linear existential
    rules. In Proc. of IJCAI 2015. AAAI Press, 2015.
14. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive
    datalog programs. In Proc. of KR 2012. AAAI Press, 2012.
15. U. Hustadt, B. Motik, and U. Sattler. Reasoning in description logics by a reduction
    to disjunctive datalog. J. Autom. Reasoning, 39(3):351–384, 2007.
16. M. Kaminski, Y. Nenov, and B. C. Grau. Computing datalog rewritings for
    disjunctive datalog programs and description logic ontologies. In Proc. of RR 2014,
    pages 76–91, 2014.
17. M. Kaminski, Y. Nenov, and B. C. Grau. Datalog rewritability of disjunctive
    datalog programs and its applications to ontology reasoning. In Proc. of AAAI
    2014, pages 1077–1083, 2014.
18. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The combined
    approach to ontology-based data access. In Proc. of IJCAI 2011. IJCAI/AAAI,
    2011.
19. C. Lutz. Inverse roles make conjunctive queries hard. In Proc. of DL 2007. CEUR-
    WS.org, 2007.
20. C. Lutz, I. Seylan, and F. Wolter. Ontology-based data access with closed predicates
    is inherently intractable(sometimes). In Proc. of IJCAI 2013. IJCAI/AAAI, 2013.
21. C. Lutz, I. Seylan, and F. Wolter. Ontology-mediated queries with closed predicates.
    In Proc. of IJCAI 2015. IJCAI/AAAI, 2015.
22. B. Motik, U. Sattler, and R. Studer. Query answering for OWL-DL with rules. J.
    Web Sem., 3(1):41–60, 2005.
23. M. Ortiz, S. Rudolph, and M. Šimkus. Worst-case optimal reasoning for the
    Horn-DL fragments of OWL 1 and 2. In Proc. of KR 2010. AAAI Press, 2010.
24. H. Pérez-Urbina, B. Motik, and I. Horrocks. Tractable query answering and
    rewriting under description logic constraints. J. Applied Logic, 8(2):186–209, 2010.
25. D. Trivela, G. Stoilos, A. Chortaras, and G. B. Stamou. Optimising resolution-based
    rewriting algorithms for OWL ontologies. J. Web Sem., 33:30–49, 2015.