Virtual OBDA over Expressive Ontologies:
                  Rewritings and Approximations?

E. Botoeva1 , D. Calvanese1 , V. Santarelli2 , D. F. Savo2 , A. Solimando3 , and G. Xiao1
              1
              Free University of Bozen-Bolzano, Italy, lastname@inf.unibz.it
              2
              Sapienza Università di Roma, Italy, lastname@dis.uniroma1.it
        3
          DIBRIS, University of Genova, Italy, alessandro.solimando@unige.it


          Abstract. In this paper we extend virtual OBDA to more expressive ontology
          languages than that of DL-LiteR , which is the logic underpinning the W3C on-
          tology language OWL 2. We achieve this by relying on two well-known mecha-
          nisms, namely conservative rewriting and approximation, and compiling some of
          the domain semantics into OBDA mappings.


1      Introduction

Ontology-Based Data Access (OBDA) is a popular paradigm that enables end users to
access data sources through an ontology, abstracting away low-level details of the data
sources themselves. The ontology provides a high-level description of the domain of
interest, and is semantically linked to the data sources by means of a set of mapping
assertions [7, 16]. Typically, the data sources are represented as relational data, the on-
tology is a set of logical axioms over concepts and roles, and each mapping assertion
relates an SQL query over the database to a concept or role of the ontology.
    Making OBDA work efficiently over large amounts of data, requires that query an-
swering over the ontology is first-order (FO)-rewritable1 [8, 1], which in turn limits the
expressiveness of the ontology language, and the degree of detail with which the domain
of interest can be captured. The current language of choice for OBDA is DL-LiteR , the
logic underlying OWL 2 QL [24], which has been specifically designed to ensure FO-
rewritability of query answering. Hence, it does not allow one to express disjunctive
information, or any form of recursion on the data (e.g., as resulting from qualified ex-
istentials on the left-hand side of concept inclusions), since using such constructs in
general causes the loss of FO-rewritability [9]. For this reason, in many situations the
expressive power of DL-LiteR is too restricted to capture real-world scenarios.
    The aim of this work is to overcome these limitations of DL-LiteR by allowing
the use of additional constructs in the ontology. To be able to exploit the added value
coming from OBDA in real-world settings, an important requirement is the efficiency
of query answering, achieved through a rewriting-based approach. This is only possi-
ble for ontology languages that are FO-rewritable. Two general mechanisms that have
?
     This paper is an abridged version of [5].
 1
     Recall that FO queries constitute the core of SQL.
been proposed to cope with computational complexity coming from high expressive-
ness of ontology languages, and that allow one to regain FO-rewritability, are conserva-
tive rewriting [20] and approximation [26, 10]. In the former, an ontology in a powerful
language is rewritten, when possible, into an equivalent one in a restricted language,
while in the latter it is approximated, thus losing part of its semantics.
    In this work, we significantly extend the practical impact of both approaches by
bringing into the picture the mapping, an essential component of OBDA that has been
ignored so far. Indeed, it is a fairly expressive component of an OBDA system, since it
allows one to make use of arbitrary SQL (hence FO) queries to relate the content of the
data source to the elements of the ontology. Hence, a natural question is how one can
use the mapping component to capture as much as possible additional domain seman-
tics, resulting in better approximations or more cases where conservative rewritings are
possible, while maintaining a DL-LiteR ontology.
    We illustrate how this can be done on an example. Consider a bank domain, where
we can specify that a checking account in the name of a person is a simple account
by means of the axiom CAcc u ∃inNameOf.Person v SAcc. Further, assume that the
information about the accounts and their owners is stored in a database D, and that the
predicates CAcc, inNameOf, and Person are connected to D respectively via the map-
ping assertions sql 1 (x)      CAcc(x), sql 2 (x, y)     inNameOf(x, y) and sql 3 (x)
Person(x), where each sql i is an SQL query over D. Then this non-DL-LiteR axiom
can be encoded by adding the assertion sql 1 (x) ./ sql 2 (x, y) ./ sql 3 (y) SAcc(x) to
the mapping. This assertion connects D directly to the ontology term SAcc by making
use of a join of the SQL queries in the original mapping. We observe that the result-
ing mapping, together with the ontology in which the non-DL-LiteR axiom has been
removed, constitutes a conservative rewriting of the original OBDA specification.
    In this paper, we elaborate on this idea, by introducing a novel framework for rewrit-
ing and approximation of OBDA specifications. Specifically, we provide a notion of
rewriting based on query inseparability of OBDA specifications [3]. To deal with those
cases where it is not possible to rewrite the OBDA specification into a query inseparable
one whose ontology is in DL-LiteR , we give a notion of approximation that is sound
for query answering. We develop techniques for rewriting and approximation of OBDA
specifications based on compiling the extra expressiveness into the mappings. We tar-
get rather expressive ontology languages, and for Horn-ALCHIQ, a Horn fragment
of OWL 2, we study decidability of existence of OBDA rewritings, and techniques to
compute them when they exist, and to approximate them, otherwise.
    We have implemented our techniques in a prototype system called O NTO P ROX,
which exploits functionalities provided by the O NTOP [27] and C LIPPER systems [14]
to rewrite or approximate an OBDA specification expressed in Horn-SHIQ to one that
can be directly processed by any OBDA system. We have evaluated O NTO P ROX over
synthetic and real OBDA instances against (i) the default O NTOP behavior, (ii) local se-
mantic approximation (LSA), (iii) global semantic approximation (GSA), and (iv) C LIP -
PER over materialized ABoxes. Using O NTO P ROX, for a few queries we have been able
to obtain more answers (in fact, complete answers, as confirmed by C LIPPER). However,
for many queries O NTO P ROX showed no difference with respect to the default O NTOP
behavior. One reason for this is that in the considered real-world scenario, the mapping
designers put significant effort to manually create complex mappings that overcome the
limitations of DL-LiteR . Essentially they followed the principle of the technique pre-
sented here, producing an OBDA specification that was already “complete” by design.
     The observations above immediately suggest a significant practical value of our ap-
proach, which can be used to facilitate the design of new OBDA specifications for ex-
isting expressive ontologies: instead of a manual compilation, which is cumbersome,
error-prone, and difficult to maintain, mapping designers can write straightforward
mappings, and the resulting OBDA specification can then be automatically transformed
into a DL-LiteR OBDA specification with rich mappings.
     Omitted proofs can be found in the extended version of this paper [4].

2     Preliminaries
2.1   Ontologies
We assume to have the following pairwise disjoint countably infinite alphabets: NC of
concept names, NR of role names, and NI of constants (also called individuals). We
consider ontologies expressed in Description Logics (DLs). Here we present the logics
Horn-ALCHIQ, the Horn fragment of SHIQ without role transitivity, and DL-LiteR ,
for which we develop some of the technical results in the paper. However, the general
approximation framework is applicable to any OWL 2 fragment.
     A Horn-ALCHIQ TBox in normal form       d [18, 14] is a finite set of axioms of the
following forms: concept inclusions (CIs) i Ai v C, role inclusions (RIs) R1 v R2 ,
and role disjointness axioms R1 u R2 v ⊥, where A, Ai denote concept names, R, R1 ,
R2 denote role names P or their inverses P − , and C denotes a concept of the form ⊥,
A, ∃R.A, ∀R.A, or ≤ 1 R.A. For an inverse role R = P − , we use R− to denote P . ⊥
denotes the empty concept/role. A DL-LiteR TBox is a finite set of axioms of the form
B1 v B2 , B1 u B2 v ⊥, R1 v R2 , and R1 u R2 v ⊥, where Bi denotes a concept
of the form A or ∃R.>. In what follows, for simplicity we write ∃R instead of ∃R.>,
we use N to denote either a concept or a role name, and assume that all TBoxes are in
normal form.
     An ABox is a finite set of membership assertions of the form A(c) or P (c, c0 ), where
    0
c, c ∈ NI . For a DL L, an L-ontology is a pair O = hT , Ai, where T is an L-TBox and
A is an ABox. A signature Σ is a finite set of concept and role names. An ontology O
is said to be defined over (or simply, over) Σ if all the concept and role names occurring
in it belong to Σ (and likewise for TBoxes, ABoxes, concept inclusions, etc.). When T
is over Σ, we denote by sig(T ) the subset of Σ actually occurring in T . Moreover we
denote with Ind(A), the set of individuals appearing in A.
     The semantics, models, and the notions of satisfaction and consistency of ontologies
are defined in the standard way. We only point out that we adopt the Unique Name
Assumption (UNA), and for simplicity we also assume to have standard names, i.e., for
every interpretation I and every constant c ∈ NI interpreted by I, we have that cI = c.

2.2   OBDA and Mappings
Let S be a relational schema over a countably infinite set NS of database predicates.
For simplicity, we assume to deal with plain relational schemas without constraints,
and with database instances that directly store abstract objects (as opposed to values).
So, a database instance D of S is a set of ground atoms over the predicates in NS and
the constants in NI .2 Queries over S are expressed in SQL. We use ϕ(x) to denote that
query ϕ has x = x1 , . . . , xn as free (i.e., answer) variables, where n is the arity of ϕ.
Given a database instance D of S and a query ϕ over S, ans(ϕ, D) denotes the set of
tuples of constants in NI computed by evaluating ϕ over D.
     In OBDA, one provides access to an (external) database through an ontology TBox,
which is connected to the database by means of a mapping. Given a source schema
S and a TBox T , a (GAV) mapping assertion between S and T has the form ϕ(x)
A(x) or ϕ0 (x, x0 )    P (x, x0 ), where A and P are respectively concept and role names,
             0      0
and ϕ(x), ϕ (x, x ) are arbitrary (SQL) queries expressed over S, called source queries.
Intuitively, given a database instance D of S and a mapping assertion m = ϕ(x)
A(x), the instances of the concept A generated by m from D is the set ans(ϕ, D);
similarly for a mapping assertion ϕ(x, x0 )       P (x, x0 ).
     An OBDA specification is a triple P = hT , M, Si, where T is a DL TBox, S is
a relational schema, and M is a finite set of mapping assertions. Without loss of gen-
erality, we assume that all concept and role names appearing in M are contained in
sig(T ). An OBDA instance is a pair hP, Di, where P is an OBDA specification, and D
is a database instance of S. The semantics of the OBDA instance hP, Di is specified in
terms of interpretations of the concepts and roles in T . We define it by relying on the
(virtual3 ) ABox AM,D = {N (o) | o ∈ ans(ϕ, D) and ϕ(x)               N (x) in M} gener-
ated by M from D, where N is a concept or role name in T . Then, a model of hP, Di
is simply a model of the ontology hT , AM,D i.
     To abstract away the low-level SQL details in source queries, following [13], we
split each mapping assertion m = ϕ(x)           N (x) in M into two parts. By introducing
an intermediate view name Vm for the SQL query ϕ(x), we obtain a low-level mapping
assertion of the form ϕ(x)         Vm (x), and a high-level mapping assertion of the form
Vm (x)      N (x). Hence, in our technical development we deal only with the high-level
mappings, and directly consider the intermediate views as our data sources.


2.3   Query Answering

We consider conjunctive queries, which are the basic and most important query-
ing mechanism in relational database systems and ontologies. A conjunctive query
(CQ) q(x) over a signature Σ is a formula ∃y. ϕ(x, y), where ϕ is a conjunction
of atoms N (z), such that N is a concept or role name in Σ, and z are variables from
x and y. The set of certain answers to a CQ q(x) over an ontology hT , Ai, denoted
cert(q, hT , Ai), is the set of tuples c of elements from Ind(A) of the same length as
x, such that q(c) (considered as a FO sentence) holds in every model of hT , Ai. We
mention two more query classes. An atomic query (AQ) is a CQ consisting of exactly
one atom whose variables are all free. A CQ with inequalities (CQ6= ) is a CQ that may
contain inequality atoms between the variables of the predicate atoms.
 2
   All our results easily extend to the case where objects are constructed from retrieved database
   values [7].
 3
   We call such an ABox ‘virtual’, because we are not interested in materializing its facts.
    Given a CQ q, an OBDA specification P = hT , M, Si and a database instance D
of S, the answer to q over the OBDA instance hP, Di, denoted cert(q, P, D), is defined
as cert(q, hT , AM,D i). Observe that, when D is inconsistent with P (i.e., hP, Di does
not have a model), then cert(q, P, D) is the set of all possible tuples of constants in
AM,D (of the same arity as q).


3   An OBDA Rewriting Framework
We extend the notion of query inseparability of ontologies [6] to OBDA specifications.
We adopt the proposal by [3], but we do not enforce preservation of inconsistency.
Definition 1. Let Σ be a signature. Two OBDA specifications P1 = hT1 , M1 , Si and
P2 = hT2 , M2 , Si are Σ-CQ inseparable if cert(q, P1 , D) = cert(q, P2 , D), for every
CQ q over Σ and every database instance D of S.
    In OBDA, one must deal with the trade-off between the computational complexity
of query answering and the expressiveness of the ontology language. Suppose that in
an OBDA specification P = hT , M, Si, T is expressed in an ontology language L that
does not allow efficient query answering. A possible solution is to exploit the expressive
power of the mapping to compute a new OBDA specification P 0 = hT 0 , M0 , Si in
which T 0 is expressed in a language Lt more suitable for query answering than L. The
aim is to encode in M0 not only M but also part of the semantics of T , so that P 0 is
query-inseparable from P. This leads to the notion of rewriting of OBDA specifications.
Definition 2. Let Lt be an ontology language. The OBDA specification P 0 =
hT 0 , M0 , Si is a CQ-rewriting in Lt of the OBDA specification P = hT , M, Si if
(i) sig(T ) ⊆ sig(T 0 ), (ii) T 0 is an Lt -TBox, and (iii) P and P 0 are Σ-CQ inseparable,
for Σ = sig(T ). If such P 0 exists, we say that P is CQ-rewritable into Lt .
     We observe that the new OBDA specification can be defined over a signature that
is an extension of that of the original TBox. This is specified by condition (i). In condi-
tion (ii), we impose that the new ontology is specified in the target language Lt . Finally,
condition (iii) imposes that the OBDA specifications cannot be distinguished by CQs
over the original TBox. Note that the definition allows for changing the ontology and
the mappings, but not the source schema, accounting for the fact that the data sources
might not be under the control of the designer of the OBDA specification.
     As expected, it is not always possible to obtain a CQ-rewriting of P in an ontology
language Lt that allows for efficient query answering. Indeed, the combined expressive-
ness of Lt with the new mappings might not be sufficient to simulate query answering
over P without loss. In these cases, we can resort to approximating query answers over
P in a sound way, which means that the answers to queries posed over the new speci-
fication are contained in those produced by querying P. Hence, we say that the OBDA
specification P 0 = hT 0 , M0 , Si is a sound CQ-approximation in Lt of the OBDA spec-
ification P = hT , M, Si if P 0 satisfies (i), (ii), and cert(q, P 0 , D) ⊆ cert(q, P, D), for
each CQ q over sig(T ) and for each instance D of S.
     Next, we study CQ-rewritability of OBDA specifications into DL-LiteR , developing
suitable techniques.
4      Rewriting OBDA Specifications

In this section, we develop our OBDA rewriting technique, which relies on Datalog
rewritings of the TBox (and mappings). Recall that a Datalog program (with inequali-
ties) is a finite set of definite Horn clauses without functions symbols, i.e., rules of the
form head ← ϕ, where ϕ is a finite non-empty list of predicate atoms and guarded
inequalities called the body of the rule, and head is an atom, called the head of the rule,
all of whose variables occur in the body. The predicates that occur in rule heads are
called intensional (IDB), the other predicates are called extensional (EDB).


4.1     ET-mappings

Now, we extend the notion of T-mappings introduced by [27], and define the notion
of an ET-mapping that results from compiling into the mapping the expressiveness of
ontology languages that are Datalog rewritable, as introduced below.
     We first introduce notation we need. Let Π be a Datalog program and N an IDB
                                                                      i
predicate. For a database D over the EDB predicates of Π, let NΠ        (D) denote the set of
facts about N that can be    deduced   from  D  by at most i ≥ 1 applications of the rules in
               ∞                   i                                        ∞
                          S
Π, and let NΠ     (D) = i≥1 NΠ       (D). It is known that the predicate NΠ   (·) defined by
                                                                    6=
N in Π can be characterized by a possibly    S infinite union of CQ s N[11], i.e., there exist
    6=   N     N                   ∞
CQ s ϕ0 , ϕ1 , . . . such that NΠ (D) = i≥0 {N (a) | a ∈ ans(ϕi , D)}, for every D.
The ϕN i ’s are called the expansions of N and can be described in terms of expansion
trees; cf. [4, Appendix A]. We denote by ΦΠ (N ) the set of expansion trees for N in
Π, and abusing notation also the (possibly infinite) union of CQ6= s corresponding to
it. Note that ΦΠ (N ) might be infinite due to the presence of IDB predicates that are
recursive, i.e., either directly or indirectly refer to themselves.
     We call a TBox T Datalog rewritable if it admits a translation ΠT to Datalog that
preserves consistency and answers to AQs (see, e.g., the translations by [17], [14], and
[28] for Horn-SHIQ, and by [12] for SHI). We assume that ΠT makes use of a
special nullary predicate ⊥ that encodes inconsistency, i.e., for an ABox A, hT , Ai
is consistent iff ⊥∞                    4
                     ΠT (A) is empty. We also assume that ΠT includes the following
auxiliary rules, which ensure that ΠT derives all possible facts constructed over sig(T )
and Ind(A) whenever hT , Ai is inconsistent:

             >∆ (x) ← A(x); >∆ (x) ← P (x, y); >∆ (y) ← P (x, y);
                          A(x) ← ⊥, >∆ (x); P (x, y) ← ⊥, >∆ (x), >∆ (y);

where A and P respectively range over concept and role names in sig(T ), and >∆ is a
fresh unary predicate denoting the set of all the individuals appearing in A.
    In the following, we denote with ΠM the (high-level) mapping M viewed as a
Datalog program, and with ΠT ,M the Datalog program ΠT ∪ ΠM associated to a
Datalog rewritable TBox T and a mapping M. From the properties of the translation
ΠT (and the simple structure of ΠM ), we obtain that ΠT ,M satisfies the following:
 4
     Here we simply consider A as a database.
     Input: Horn-ALCHIQ TBox T and mapping M.
     Output: DL-LiteR TBox Tr and ET-mapping Mc .

     Step 1: T1 is obtained from T by adding all CIs of the form Ai v ∃R.( A0j ) entailed by
                                                                    d          d
                                          0
             T , for concept names Ai , Aj ∈ sig(T ).
     Step 2: T2 = norm∃ (T1 ).
     Step 3: T3 = normu (T2 ).
     Step 4: Mc is etmT3 (M), and Tr is the DL-LiteR TBox consisting of all DL-LiteR axioms
             over sig(T3 ) entailed by T3 (including the trivial ones N v N ).

                    Fig. 1. OBDA specification rewriting procedure RewObda.

Lemma 1. Let hT , M, Si be an OBDA specification where T is Datalog rewritable.
Then, for every database instance D of S, concept or role name N of T , and a in
                                                           ∞
Ind(AM,D ), we have that hT , AM,D i |= N (a) iff N (a) ∈ NΠ T ,M
                                                                  (D).

    For a predicate N , we say that an expansion ϕN ∈ ΦΠT ,M (N ) is DB-defined if ϕN
is defined over database predicates. Now we are ready to define ET-mappings.
Definition 3. Let hT , M, Si be an OBDA specification where T is Datalog rewritable.
The ET-mapping for M and T , denoted etmT (M), is defined as the set of assertions
of the form ϕN (x)     N (x) such that N is a concept or role name in T , and ϕN ∈
ΦΠT ,M (N ) is DB-defined.
    It is easy to show that, for M0 = etmT (M) and each database instance D, the vir-
tual ABox AM0 ,D (which can be defined for ET-mappings as for ordinary mappings)
contains all facts entailed by hT , AM,D i. In this sense, the ET-mapping etmT (M)
plays for a Datalog rewritable TBox T the same role as T-mappings play for (the sim-
pler) DL-LiteR TBoxes. Note that, in general, an ET-mapping is not a mapping, as it
may contain infinitely many assertions. However, AM0 ,D is still finite, given that it is
constructed over the finite number of constants appearing in D.


4.2     Rewriting Horn-ALCHIQ OBDA Specifications to DL-LiteR

Let hT , M, Si be an OBDA specification, where T is a Horn-ALCHIQ TBox over a
signature Σ. Procedure RewObda(T , M), in Figure 1, constructs a DL-LiteR TBox Tr
and an ET-mapping Mc such that hTr , Mc , Si is Σ-CQ inseparable from hT , M, Si.
                             applies to T1 the normalization step norm∃ , which gets rid
    In Step 2, the procedure d
of concepts of the form ∃R.( A0j ) in the right-hand side of CIs. This is achieved by the
                                                  dm               dn
following well-known substitution [1]: every CI i=1 Ai v ∃R.( j=1 A0j ) in T1 is re-
             dm
placed with i=1 Ai v ∃Pnew , Pnew v R, and > v ∀Pnew .A0j , for 1 ≤ j ≤ n, where
Pnew is a fresh role name. Notice that the latter two forms of inclusions introduced by
norm∃ are actually in DL-LiteR , as > v ∀Pnew .A0j is equivalent to ∃Pnew   −
                                                                                v A0j . In
Step 3, the procedure applies to T2 a further normalization step, normu , which intro-
duces a fresh concept name AA1 u···uAn for each concept conjunction A1 u · · · u An ap-
pearing in T2 , and adds A1 u· · ·uAn ≡ AA1 u···uAn 5 to the TBox. Note that norm∃ (T1 )
 5
     We use ‘≡’ to abbreviate inclusion in both directions.
and normu (T2 ) are model-conservative extensions of T1 and T2 , respectively [21], as
one can easily show. We denote by rew(T ) the resulting TBox Tr , which in general is
exponential in the size of T , and by comp(T , M) the resulting ET-mapping Mc , which
in general is infinite.
Example 1 Assume that the domain knowledge is represented by the TBox T =
{ CAccu∃inNameOf.Person v SAcc } about bank accounts. Its normalization is T b =
{Person v ∀inNameOf − .A1 , CAcc u A1 v SAcc}. Assume that the database schema
S b consists of the two relations ENT(ID, TYPE, EMPID), PROD(NUM, TYPE, CUSTID),
whose data are mapped to the ontology terms by means of the following mapping M:
     mP : SELECT ID AS X FROM ENT WHERE ENT.TYPE=’P’     Person(X)
     mN : SELECT NUM AS X, CUSTID AS Y FROM PROD    inNameOf(X, Y)
     mC : SELECT NUM AS X FROM PROD P WHERE P.TYPE=’B’     CAcc(X)
The corresponding high-level mapping Mb consists of the assertions:
     hP : {x | VPerson (x)}       Person(x)
     hN : {x, y | VinNameOf (x, y)}   inNameOf(x, y)
     hC : {x | VCAcc (x)}        CAcc(x)
Now, consider the OBDA specification P b = hT b , Mb , S b i. The RewObda procedure
invoked on (T b , Mb ) produces:
 – The intermediate TBoxes T1b and T2b coinciding with T b , and T3b extending T b with
   ACAccuA1 ≡ CAcc u A1 .
 – The ET-mapping Mbc = etmT3b (Mb ), which extends Mb with the following three
   assertions: {x | VinNameOf (x, y), VPerson (y)} A1 (x);
                {x | VCAcc (x), VinNameOf (x, y), VPerson (y)}   SAcc(x);
                {x | VCAcc (x), VinNameOf (x, y), VPerson (y)}   ACAccuA1 (x).

The procedure returns the DL-LiteR TBox Trb = {ACAccuA1 v CAcc, ACAccuA1 v A1 ,
                                                                     b
ACAccuA1 v SAcc} and the mapping Mbc . It is possible to show that PDL-LiteR
                                                                             =
   b   b    b                        b
hTr , Mc , S i is a CQ-rewriting of P into DL-LiteR .
     The TBox T3 obtained as an intermediate result in Step 3 of RewObda(T , M), is a
model-conservative extension of T , tailored towards capturing in DL-LiteR the answers
to tree-shaped CQs. This is obtained by introducing in Step 2 sufficiently many new role
names, and in Step 3 new concept names, so as to capture entailed axioms that generate
the tree-shaped parts of models. Note that at-most number restrictions do not participate
in the generation of the tree-shaped parts of models, hence are not considered explicitly
in Steps 1 to 3. On the other hand, the ET-mapping Mc = comp(T , M) generates from
a database instance a virtual ABox Av that is complete with respect to all ABox facts
that might be involved in the generation of the tree-shaped parts of models of Tr and
Av . This allows us to prove the main result of this section.
Theorem 2. Let hT , M, Si be an OBDA specification such that T is a
Horn-ALCHIQ TBox, and let hTr , Mc i = RewObda(T , M). Then hT , M, Si and
hTr , Mc , Si are Σ-CQ inseparable, for Σ = sig(T ).
    Clearly, hTr , Mc , Si is a candidate for being a CQ-rewriting of hT , M, Si into DL-
LiteR . However, since Mc might be an infinite set, hTr , Mc , Si might not be an OBDA
specification and hence might not be effectively usable for query answering. Next we
address this issue, and show that in some cases we obtain proper CQ-rewritings, while
in others we have to resort to approximations.


5   Approximating OBDA Specifications
To obtain from an ET-mapping a proper mapping, we exploit the notion of predicate
boundedness in Datalog, and use a bound on the depth of Datalog expansion trees.
    An IDB predicate N is said to be bounded in a Datalog program Π, if there exists
                                                                                  k
a constant k depending only on Π such that, for every database D, we have NΠ        (D) =
   ∞
NΠ (D) [11]. If N is bounded in Π, then there exists an equivalent Datalog program
Π 0 such that ΦΠ 0 (N ) is finite, and thus represents a finite union of CQ6= s. It is well
known that predicate boundedness for Datalog is undecidable in general [15]. We say
that Ω is a boundedness oracle if for a Datalog program Π and a predicate N it returns
one of the three answers: N is bounded in Π, N is not bounded in Π, or unknown.
When N is bounded, Ω returns also a finite union of CQ6= s, denoted ΩΠ (N ), defining
N . Given a constant k, ΦkΠ (N ) denotes the set of trees (and the corresponding union of
CQ6= s) in ΦΠ (N ) of depth at most k, hence ΦkΠ (N ) is always finite.
    We introduce a cutting operator cutΩ  k , which is parametric with respect to the cut-
ting depth k > 0 and the boundedness oracle Ω, which, when applied to a predicate N
and a Datalog program Π, returns a finite union of CQ6= s as follows:
                      (
         Ω
                       ΩΠ (N ), if N is bounded in Π w.r.t. Ω
     cutk N, Π =
                        ΦkΠ (N ), otherwise.
We apply cutting also to ET-mappings: given an ET-mapping etmT (M), the mapping
cutΩ                                                          N
    k (etmT (M)) is the (finite) set of mapping assertions ϕ (x)        N (x) s.t. N is a
                                   N       Ω
concept or role name in T , and ϕ ∈ cutk (N, ΠT ,M ) is DB-defined.
    The following theorem provides a sufficient condition for CQ-rewritability into DL-
LiteR in terms of the well-known notion of first-order (FO)-rewritability, which we
recall here: a query q is FO-rewritable with respect to a TBox T , if there exists a
FO query q 0 such that cert(q, hT , Ai) = ans(q 0 , A), for every ABox A over sig(T )
(viewed as a database). It uses the fact that if an AQ is FO-rewritable with respect to a
Horn-ALCHIQ TBox T , then it is actually rewritable into a union of CQ6= s, and the
fact that if T is FO-rewritable for AQs (i.e., every AQ is FO-rewritable with respect to
T ), then each concept and role name is bounded in ΠT [22, 2].
Theorem 3. Let hT , M, Si be an OBDA specification such that T is a
Horn-ALCHIQ TBox. Further, let Tr = rew(T ) and M0 = cutΩ             k (comp(T , M)),
for a boundedness oracle Ω and some k > 0. If T is FO-rewritable for AQs, then
hT , M, Si is CQ-rewritable into DL-LiteR , and hTr , M0 , Si is its CQ-rewriting. Oth-
erwise, hTr , M0 , Si is a sound CQ-approximation of hT , M, Si in DL-LiteR .
    This gives us decidable conditions for rewritability of OBDA specifications in sev-
eral significant cases. It is shown by [2] and [22] that FO-rewritability of AQs relative
to Horn-SHI-TBoxes, Horn-ALCF-TBoxes, and Horn-ALCIF-TBoxes of depth two
is decidable. In fact, these FO-rewritability algorithms provide us with a boundedness
oracle Ω: for each concept and role name N in T , they return a FO-rewriting of the AQ
N (x) that combined with the mapping M results in ΩΠT ,M (N ).
    Unfortunately, a complete characterization of CQ-rewritability into DL-LiteR is not
possible if arbitrary FO-queries are allowed in the (low-level) mapping.
Theorem 4. The problem of checking whether an OBDA specification with an EL on-
tology and FO source queries in the mapping is CQ-rewritable into DL-LiteR is unde-
cidable.
   However, if we admit only unions of CQs in the (low-level) mapping, it follows
from decidability of boundedness of monadic Datalog programs [11] that we can fully
characterize CQ-rewritability.

Theorem 5. The problem of checking whether an OBDA specification with a Horn-
ALCHI ontology of depth one and unions of CQs as source queries in the mapping is
CQ-rewritable into DL-LiteR is decidable.


6    Implementation
To demonstrate the feasibility of our OBDA specification rewriting technique, we
implemented the O NTO P ROX6 prototype system and evaluated it over synthetic and
real OBDA instances. It relies on the OBDA reasoner O NTOP7 and the complete
Horn-SHIQ CQ-answering system C LIPPER8 , used as Java libraries, on a standard Pro-
log engine (SWI -P ROLOG9 ), and on an OWL 2 reasoner (H ERMI T10 ).
    Essentially, O NTO P ROX implements the rewriting and compiling procedure de-
scribed in Figure 1, but instead of computing the (possibly infinite) ET-mapping
comp(T , M), it computes its finite part cutk (comp(T, M)). So, it gets as input an
OWL 2 OBDA specification hTOWL2 , M, Si and a positive integer k, and produces a
DL-LiteR OBDA specification that can be used with any OBDA system. Below we
describe some of the implementation details:
(1) TOWL2 is first approximated to the Horn-SHIQ TBox T by dropping the axioms
    outside this fragment.
(2) T is translated intoda (possibly recursive)  Datalog program Π and saturated with
    all CIs of the form Ai v ∃R.( A0j ), using functionalities provided by C LIPPER.
                                      d
(3) The expansions cutk (ΦΠ (X)) are computed by an auxiliary Prolog program using
    Prolog meta-programming.
(4) To produce actual mappings that can be used by an OBDA reasoner, the views in
    the high-level mapping cutk (comp(T , M)) are replaced with their original SQL
    definitions using functionalities of O NTOP.
(5) The DL-LiteR closure is computed by relying on the OWL 2 reasoner for Horn--
    SHIQ TBox classification.
 6
   https://github.com/ontop/ontoprox/
 7
   http://ontop.inf.unibz.it/
 8
   http://www.kr.tuwien.ac.at/research/systems/clipper/
 9
   http://www.swi-prolog.org/
10
   http://hermit-reasoner.com/
    For the experiments, we have considered two scenarios:
UOBM. The university ontology benchmark (UOBM) [23] comes with a SHOIN
ontology (with 69 concepts, 35 roles, 9 attributes, and 204 TBox axioms), and an ABox
generator. We have designed a suitable database schema for the generated ABox, con-
verted the ABox to a 10MB database instance for the schema, and manually created the
mapping, consisting of 96 assertions11 .
Telecom benchmark. The telecommunications ontology models a portion of the net-
work of a leading telecommunications company, namely the portion connecting sub-
scribers to the operating centers of their service providers. The current specification
consists of an OWL 2 ontology with 152 concepts, 53 roles, 73 attributes, 458 TBox
axioms, and of a mapping with 264 mapping assertions. The database instance contains
32GB of real-world data.
    For each OBDA instance hhT , M, Si, Di, we have evaluated the number of query
answers and the query answering time with respect to five different setups:
(1) The default behavior of O NTOP v1.15, which simply ignores all non-DL-LiteR ax-
    ioms in T , i.e., using hT 1 , M, Si where T 1 are all the DL-LiteR axioms in T .
(2) The local semantic approximation (LSA) of T in DL-LiteR , i.e., using hT 2 , M, Si
    where T 2 is obtained as the union, for each axiom α ∈ T , of the set of DL-LiteR
    axioms Γ (α) entailed by α [10].
(3) The global semantic approximation (GSA) of T in DL-LiteR , i.e., using
    hT 3 , M, Si where T 3 is the DL-LiteR closure of T [25].
(4) Result of O NTO P ROX, i.e., hrew(T ), cut5 (comp(T, M)), Si.
(5) C LIPPER over the materialization of the virtual ABox.
The details of the evaluation can be found in [4].

7     Conclusions
We proposed a novel framework for rewriting and approximating OBDA specifications
in an expressive ontology language to specifications in a weaker language, in which the
core idea is to exploit the mapping layer to encode part of the semantics of the original
specification, and we developed techniques for DL-LiteR as the target language.
     We plan to continue our work along the following directions: (i) extend our
technique to Horn-SHIQ, and, more generally, to Datalog rewritable TBoxes [12];
(ii) deepen our understanding of the computational complexity of deciding CQ-
rewritability of OBDA specifications into DL-LiteR ; (iii) extend our technique to
SPARQL queries under different OWL entailment regimes [19]; (iv) carry out more
extensive experiments, considering queries that contain existentially quantified vari-
ables. This will allow us to verify the effectiveness of RewObda, which was designed
specifically to deal with existentially implied objects.
Acknowledgement. This work is partially supported by the EU under the large-scale
integrating project (IP) Optique (Scalable End-user Access to Big Data), grant agree-
ment n. FP7-318338. We thank Martin Rezk for insightful discussions, and Benjamin
Cogrel and Elem Güzel for help with the experimentation.
11
     https://github.com/ontop/ontop-examples/tree/master/aaai-2016-ontoprox/uobm
References
 1. Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. The
    DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009.
 2. Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. First-order rewritability of atomic
    queries in Horn description logics. In Proc. of the 23rd Int. Joint Conf. on Artificial In-
    telligence (IJCAI), pages 754–760, 2013.
 3. Meghyn Bienvenu and Riccardo Rosati. Query-based comparison of OBDA specifications.
    In Proc. of the 28th Int. Workshop on Description Logic (DL), volume 1350 of CEUR Elec-
    tronic Workshop Proceedings, 2015.
 4. Elena Botoeva, Diego Calvanese, Valerio Santarelli, Domenico Fabio Savo, Alessandro Soli-
    mando, and Guohui Xiao. Beyond OWL 2 QL in OBDA: Rewritings and approximations
    (Extended version). CoRR Technical Report abs/1511.08412, arXiv.org e-Print archive,
    2015. Available at http://arxiv.org/abs/1511.08412.
 5. Elena Botoeva, Diego Calvanese, Valerio Santarelli, Domenico Fabio Savo, Alessandro Soli-
    mando, and Guohui Xiao. Beyond OWL 2 QL in OBDA: Rewritings and approximations.
    In Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI), 2016.
 6. Elena Botoeva, Roman Kontchakov, Vladislav Ryzhikov, Frank Wolter, and Michael Za-
    kharyaschev. Query inseparability for description logic knowledge bases. In Proc. of the
    14th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR), pages
    238–247. AAAI Press, 2014.
 7. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella
    Poggi, Mariano Rodriguez-Muro, and Riccardo Rosati. Ontologies and databases: The DL-
    Lite approach. In 5th Reasoning Web Int. Summer School Tutorial Lectures (RW), volume
    5689 of LNCS, pages 255–356. Springer, 2009.
 8. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric-
    cardo Rosati. Tractable reasoning and efficient query answering in description logics: The
    DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007.
 9. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric-
    cardo Rosati. Data complexity of query answering in description logics. Artificial Intelli-
    gence, 195:335–360, 2013.
10. Marco Console, José Mora, Riccardo Rosati, Valerio Santarelli, and Domenico Fabio Savo.
    Effective computation of maximal sound approximations of description logic ontologies. In
    Proc. of the 13th Int. Semantic Web Conf. (ISWC), volume 8797 of LNCS, pages 164–179.
    Springer, 2014.
11. Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable
    optimization problems for database logic programs. In Proc. of the 20th ACM SIGACT Symp.
    on Theory of Computing (STOC), pages 477–490, 1988.
12. Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, and Ian Horrocks. Computing dat-
    alog rewritings beyond Horn ontologies. In Proc. of the 23rd Int. Joint Conf. on Artificial
    Intelligence (IJCAI), pages 832–838, 2013.
13. Floriana Di Pinto, Domenico Lembo, Maurizio Lenzerini, Riccardo Mancini, Antonella
    Poggi, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. Optimizing query rewrit-
    ing in ontology-based data access. In Proc. of the 16th Int. Conf. on Extending Database
    Technology (EDBT), pages 561–572. ACM Press, 2013.
14. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui Xiao. Query
    rewriting for Horn-SHIQ plus rules. In Proc. of the 26th AAAI Conf. on Artificial Intelligence
    (AAAI), pages 726–733. AAAI Press, 2012.
15. Haim Gaifman, Harry G. Mairson, Yehoshua Sagiv, and Moshe Y. Vardi. Undecidable opti-
    mization problems for database logic programs. In Proc. of the 2nd IEEE Symp. on Logic in
    Computer Science (LICS), pages 106–115, 1987.
16. Martin Giese, Ahmet Soylu, Guillermo Vega-Gorgojo, Arild Waaler, Peter Haase, Ernesto
    Jiménez-Ruiz, Davide Lanti, Martin Rezk, Guohui Xiao, Özgür L. Özçep, and Riccardo
    Rosati. Optique: Zooming in on Big Data. IEEE Computer, 48(3):60–67, 2015.
17. Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Data complexity of reasoning in very ex-
    pressive description logics. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence
    (IJCAI), pages 466–471, 2005.
18. Yevgeny Kazakov. Consequence-driven reasoning for Horn-SHIQ ontologies. In Proc. of
    the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 2040–2045, 2009.
19. Roman Kontchakov, Martin Rezk, Mariano Rodriguez-Muro, Guohui Xiao, and Michael
    Zakharyaschev. Answering SPARQL queries over databases under OWL 2 QL entailment
    regime. In Proc. of the 13th Int. Semantic Web Conf. (ISWC), LNCS. Springer, 2014.
20. Carsten Lutz, Robert Piro, and Frank Wolter. Description logic TBoxes: Model-theoretic
    characterizations and rewritability. In Proc. of the 22nd Int. Joint Conf. on Artificial Intelli-
    gence (IJCAI), pages 983–988, 2011.
21. Carsten Lutz, Dirk Walther, and Frank Wolter. Conservative extensions in expressive de-
    scription logics. In Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages
    453–458, 2007.
22. Carsten Lutz and Frank Wolter. Non-uniform data complexity of query answering in de-
    scription logics. In Proc. of the 24th Int. Workshop on Description Logic (DL), volume 745
    of CEUR Electronic Workshop Proceedings, 2011.
23. Li Ma, Yang Yang, Zhaoming Qiu, GuoTong Xie, Yue Pan, and Shengping Liu. Towards
    a complete OWL ontology benchmark. In Proc. of the 3rd European Semantic Web Conf.
    (ESWC), volume 4011 of LNCS, pages 125–139. Springer, 2006.
24. Boris Motik, Achille Fokoue, Ian Horrocks, Zhe Wu, Carsten Lutz, and Bernardo
    Cuenca Grau. OWL Web Ontology Language profiles. W3C Recommendation, World
    Wide Web Consortium, October 2009.                Available at http://www.w3.org/TR/
    owl-profiles/.
25. Jeff Z. Pan and Edward Thomas. Approximating OWL-DL ontologies. In Proc. of the 21st
    AAAI Conf. on Artificial Intelligence (AAAI), pages 1434–1439, 2007.
26. Yuan Ren, Jeff Z. Pan, and Yuting Zhao. Soundness preserving approximation for TBox
    reasoning. In Proc. of the 24th AAAI Conf. on Artificial Intelligence (AAAI), 2010.
27. Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology-
    based data access: Ontop of databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC),
    volume 8218 of LNCS, pages 558–573. Springer, 2013.
28. Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou. Optimising
    resolution-based rewriting algorithms for OWL ontologies. J. of Web Semantics, pages 30–
    49, 2015.