<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diego Calvanese</string-name>
          <email>diego.calvanese@umu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Corman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Lanti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Razniewski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Max-Planck-Institut für Informatik</institution>
          ,
          <addr-line>Saarbrücken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Umeå University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated under count semantics over a DL-Lite knowledge base. The target query language is an extension of relational algebra with aggregation and arithmetic functions, which can be translated into SQL. The algorithm supports number restrictions on the RHS of axioms in the input TBox, which can be used to encode statistics. The size of the output query remains linear in the binary encoding of these numbers, which improves upon previously proposed approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Uni v
MedUni v
BigUni v
1000 enrolledIn
5000 enrolledIn
20000 enrolledIn
? Copyright © 2020 for this paper by its authors. Use permitted under Creative</p>
      <p>
        Commons License Attribution 4.0 International (CC BY 4.0)
1 Also referred to as OBDA (for Ontology Based Data Access), when emphasis is
placed on mappings connecting external data sources to a TBox [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>And an ABox of the form:</title>
        <p>Uni(UniBZ); BigUni(UW); BigUni(TUW); : : : ;
region(UniBZ; SouthTyrol); region(UW; Vienna); region(TUW; Vienna); : : : ;
enrolledIn(Alice; UniBZ); enrolledIn(Bob; UW); enrolledIn(Carol; TUW); : : :</p>
      </sec>
      <sec id="sec-1-2">
        <title>In practice, the information about enrollment in such KB may be incomplete.</title>
        <p>First, aggregated data about a whole university may be available, but not the
detailed records. The opposite may also hold: detailed records may be provided
by a university, but the central repository may not have an aggregated value
for it. In such scenarios, even if the data is incomplete, one may still want to
compute a lower bound on the number of enrollments of students within each
region. In more formal terms, one might be interested in counting the number
of answers, as in the following query:
q(count(x; y))</p>
        <p>enrolledIn(x; y) ^ region(y; Vienna)</p>
      </sec>
      <sec id="sec-1-3">
        <title>Answering such query may require arithmetic operations that combine num</title>
        <p>bers present in number restrictions in the TBox, and numbers obtained by
counting ABox statements. For instance, in this example, assume that TUW and UW
are the only two universities in the KB that are registered in the region of Vienna.</p>
      </sec>
      <sec id="sec-1-4">
        <title>Assume also that UW does not share enrollment records, whereas TUW shares</title>
        <p>them, however they may be incomplete (for instance, because some faculties
within TUW have not yet communicated their information regarding
enrollments). Then, the output value for count(x; y) should be the sum of: 20 000 on
the one hand (for UW ), and the largest value between 20 000 and n on the other
hand (for TUW ), where n is the number of enrollment records shared by TUW.</p>
        <p>
          Query answering over DL KBs has been extensively studied for boolean and
enumeration queries. With respect to the focus of this paper, a relevant result [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
is that for conjunctive queries (CQs) and unions of CQs (UCQs), answering a
query over a KB expressed in certain lightweight DLs can be performed by
rewriting it, according to the axioms in the TBox, into a relational algebra
(RA) query over the ABox. This property, called FO rewritability, allows one
to use an RDBMS for the evaluation of the rewritten query, thus leveraging
already mature optimization techniques implemented in such systems. Among
these DLs, the DL-Lite family [
          <xref ref-type="bibr" rid="ref1 ref6">6,1</xref>
          ] has been widely studied and adopted in
        </p>
      </sec>
      <sec id="sec-1-5">
        <title>OMQA/OBDA systems, resulting in the OWL 2 QL standard [13].</title>
      </sec>
      <sec id="sec-1-6">
        <title>In comparison, the problem of counting answers to a CQ over a DL-Lite</title>
        <p>
          KB has seen little interest in the literature. A seminal result was provided in
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], showing that the problem is significantly harder already for relatively
inexpressive DLs, with a shift from AC0-membership to coNP-completeness in data
complexity, i.e., assuming the sizes of the query and TBox to be fixed. Another
key result was provided in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], but for a slightly different semantics called bag
semantics : for certain variants of DL-Lite, CQs that are rooted (i.e., with at
least one constant or answer variable) can be rewritten into a variant of bag
        </p>
      </sec>
      <sec id="sec-1-7">
        <title>RA. However, this result is provided for DLs without number restrictions (like</title>
        <p>
          1000 enrolledIn in the example above). In addition, because bag semantics
differs from the count semantics studied in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], this rewritability result does not
directly apply to our setting. Finally, in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], it was shown that under count
semantics (and for some variants of DL-Lite), rooted2 CQs can be rewritten
into a variant of RA with aggregation and arithmetic functions. However, the
provided rewriting algorithm may yield a query whose size is polynomial in the
values of the numbers that occur in the TBox, i.e., exponential in the binary
representation of such numbers, which is impractical.
        </p>
      </sec>
      <sec id="sec-1-8">
        <title>In this paper, we improve upon this last result, showing how for the same</title>
      </sec>
      <sec id="sec-1-9">
        <title>DL, and under count semantics, rooted CQs can be rewritten into a different</title>
        <p>extension of RA, such that the number of algebraic operators in the output
query does not depend on the numbers that occur in the TBox, but only on
the number of axioms of the TBox. As a consequence, the size of the resulting
query is polynomial in the numbers that appear in the number restrictions of the</p>
      </sec>
      <sec id="sec-1-10">
        <title>TBox, even when such numbers are represented in binary. This is an important step towards efficient query answering in such setting.</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>We assume mutually disjoint countably infinite sets NI of individuals (a.k.a.</title>
        <p>constants ), NE of anonymous individuals (induced by existential quantification),</p>
      </sec>
      <sec id="sec-2-2">
        <title>NV of variables, NC of concept names (i.e., unary predicates, denoted with A),</title>
        <p>and NR of role names (i.e., binary predicates, denoted with P ). An atom is
an expression of the form A(s) or P (s; t), with A 2 NC, P 2 NR, and s; t 2
NI [ NE [ NV. In the following, boldface letters like t denote tuples, and we
sometimes treat tuples as sets. If t = (t1; ::; tn) is a tuple and f a function
defined for each ti, we use f (t) to denote the tuple (f (t1); ::; f (tn)).</p>
        <p>An interpretation I is a FO structure h I ; I i, where the domain I is
a non-empty subset of NI [ NE, and the interpretation function I maps each
constant c 2 NI to itself (cI = c, or in other words, we adopt the standard names
assumption), each concept name A 2 NC to a set AI I , and each role name
P 2 NR to a binary relation P I I I .</p>
        <p>If I = h I ; :I i is an interpretation, we may use I to denote the set of atoms
fA(s) j A 2 NC; s 2 AI g [ fP (s; t) j P 2 NR and (s; t) 2 P I g. Conversely, if
S is a set of unary and binary atoms with arguments in NI [ NE, we may view
it as the interpretation h S ; S i, where S is the union of the arguments of
all atoms in S, and S is defined by AS = fs j A(s) 2 Sg, for A 2 NC, and
P S = f(s; t) j P (s; t) 2 Sg, for P 2 NR.</p>
        <p>
          A KB is a pair K = hT ; Ai, where A, called ABox, is a finite set of atoms
with arguments in NI, and T , called TBox, is a finite set of axioms. In this
2 Precisely, the result provided in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ] holds for rooted and connected CQs. This
result immediately extends to rooted but disconnected CQs, by observing that the
definition of rootedness requires each connected component to be rooted. Therefore,
to answer such CQ, it suffices to compute the answers to each connected component
separately, and then the cartesian product of these answers, multiplying the obtained
cardinalities.
        </p>
        <p>R
! P j P</p>
        <p>
          B
! A j
paper, we focus on (fragments of) DL-LitecHoNre– [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], a member of the the
DLLite family [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] where each axiom is a concept inclusion of the form B v C or
a role inclusion of the form R1 v R2, following the grammar of Figure 1. From
now on, we use the symbols B, C, P and R in accordance with this grammar,
and we call these elements respectively basic concepts, concepts, atomic roles and
roles. Concepts of the form n R are called number restrictions. The fragment of
DL-LitecHoNre– that disallows role inclusions and number restrictions where n &gt; 1
is called DL-Litecore [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <sec id="sec-2-2-1">
          <title>The evaluation CI (resp. RI ) of a concept C (resp. role R) over interpretation</title>
          <p>
            an I is defined inductively over the structure of C (resp. R) as usual (see [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]
for a definition). An interpretation I is a model of hT ; Ai iff A I holds, and
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>BI CI (resp. R1I R2I ) holds for each concept inclusion (resp. role inclusion)</title>
        <p>axiom B v C (resp. R1 v R2) in T . A KB is satisfiable iff it admits at least one
model. For readability, in what follows we focus on satisfiable KBs, that is, we
use “a KB” as a shortcut for “a satisfiable KB”.</p>
        <sec id="sec-2-3-1">
          <title>A conjunctive query (CQ) q is an expression of the form</title>
          <p>q(x) p1(t1); : : : ; pn(tn), where x NV, and each pi(ti) is an atom with
arguments in NV [ NI. In addition, we require safeness, i.e., x t1 [ [ tn.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>We use dist(q) to denote x, called the distinguished variables of q, and we</title>
          <p>
            use body(q) to denote fp1(t1); : : : ; pn(tn)g. The Gaifman graph Gq of q is the
undirected graph whose vertices are the variables appearing in body(q), and
that contains an edge between x1 and x2 iff P (x1; x2) 2 body(q) for some role
P . Following [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], we call a CQ q rooted if each connected component in Gq
contains at least one constant or distinguished variable.
          </p>
          <p>
            We adopt the semantics proposed in [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] for counting answers to a CQ over
a KB, which we call count semantics. If f is a function and D a subset of its
domain, then we use f jD to denote f restricted to D. A match for a CQ q
over an interpretation I is a homomorphism from body(q) to I. Let M be the
set of matches for q over I, let ! be a mapping from dist(q) to NI, and let
= f 2 M j jdist(q) = !g. Then the pair h!; j ji is an answer to q over I iff
j j 1. And we use ans(q; I) to denote the set of answers to q over I. Finally,
a pair h!; ki is a certain answer to q over a KB K (under count semantics )
iff k is the largest element of N [ f+1g such that, for each model I of K,
h!; kI i 2 ans(q; I) for some kI k. We use certAns(q; K) to denote the set of
certain answers to q over K.
          </p>
        </sec>
        <sec id="sec-2-3-3">
          <title>A property that has been extensively studied in the OBDA/OMQA liter</title>
          <p>
            ature is FO rewritability [
            <xref ref-type="bibr" rid="ref1 ref6">6,1</xref>
            ]. Query answering for a class Q of queries is FO
rewritable for a DL L iff, for every L TBox T and Q-query q, there is a FO query
q0 such that, for every ABox A, the certain answers to q over hT ; Ai and the
answers to q0 over A (viewed as an interpretation) coincide. In particular, under
standard certain answer semantics for DLs (which differs from the definition of
certAns(q; K) under count semantics given above), query answering for UCQs
is FO rewritable for several members of the DL-Lite family. Since RA captures
(domain-independent) FO logic, the evaluation of the query obtained via
rewriting can be delegated to a RDBMS. Then in [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], this notion of rewritability has
been adapted to a different target query language, called BCALC, which
captures relational bag algebra [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], and can be translated into SQL queries with
aggregation.
3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Query answering under count semantics over a relational DB can be viewed as
a specific case of query answering under bag semantics, investigated notably in
[
        <xref ref-type="bibr" rid="ref10 ref12">10,12</xref>
        ]. Instead, in our setting, and in line with the OMQA/OBDA literature,
we assume that the input ABox is a set rather than a bag. The counting problem
over sets has also been studied recently in the relational DB setting [
        <xref ref-type="bibr" rid="ref15 ref9">15,9</xref>
        ], but
from the perspective of combined complexity, where the query is considered part
of the input (i.e., its size is not assumed to be fixed).
      </p>
      <sec id="sec-3-1">
        <title>As for (DL-Lite) KBs, in [8] an alternative (epistemic) count semantics is</title>
        <p>defined, which counts over all grounded tuples (i.e., over NI) entailed by the KB.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Such a semantics does not account for existentially implied individuals, and thus cannot capture the statistics motivating our work.</title>
      </sec>
      <sec id="sec-3-3">
        <title>Closer to our concern is the work presented in [11], which introduces the</title>
        <p>count semantics for CQs over a DL KB adopted in this paper. In particular the</p>
      </sec>
      <sec id="sec-3-4">
        <title>Count problem, which is the decision variant of query answering under count se</title>
        <p>
          mantics, was shown in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to be coNP-hard in data complexity for the relatively
inexpressive DL DL-Litecore , even when negated concepts are forbidden. This
implies that for this DL and arbitrary CQs, query answering under count
semantics is not rewritable into a target query language for which query answering is
easier than coNP in data complexity.
        </p>
        <p>
          Then, rewritability of CQs over a DL-Lite KB was investigated in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for
the related problem of query answering under bag semantics. In particular, a
rewriting algorithm was provided for rooted CQs into the language BCALC
(which can be evaluated in TC0), and for a DL that extends DL-Litecore with a
restricted form of role subsumption. This result does not immediately transfer
to our setting though, for two reasons. First, this DL cannot express number
restrictions on the right-hand side (RHS) of TBox axioms, and therefore cannot
encode statistics like the ones from our motivating example. Second, as shown
in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], for DL-Litecore already, bag and count semantics differ in the presence
of existential quantification on the left-hand side (LHS) of TBox axioms, which
are typically used to express the domain and range of an atomic role.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>To understand this difference, we recall the basics of query answering under</title>
        <p>
          bag semantics (and refer to [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for a complete definition.) A bag interpretation I
maps each atomic concept A (resp. atomic role P ) to a bag AI (resp. P I ). Such
bag can be seen as a function that maps each element of I (resp. I I )
to the number of times it occurs in the bag. This function :I is then extended
to complex concepts and roles inductively (see [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]). And I satisfies an axiom
C1 v C2 (resp. R1 v R2) iff (C1)I (d) (C2)I (d) for every d 2 I (resp.
(R1)I (d1; d2) (R2)I (d1; d2) for every (d1; d2) 2 I I ).
        </p>
        <p>Now let q(x) p1(t1); : : : ; pn(tn) be a CQ, let I be a bag interpretation, let</p>
        <sec id="sec-3-5-1">
          <title>V be the set of variables that appear in q, let ! be a mapping from x to NI, let</title>
          <p>= f : V ! I j jdist(q) = !g, and let k = P 2 Q1 i n(pi)I ( (ti)). Then
h!; ki is a bag answer to q over I iff k 1. We use bagAns(q; I) to denote the
set of bag answers to q over I. And if K is a KB, we use bagCertAns(q; K) to
denote the certain answers to q over K under bag semantics, defined analogously
to certain answers under count semantics.3</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>The difference between bag and count semantics is illustrated in [14] with the following example:</title>
        <p>Example 1. Consider the KB K = hT ; Ai, where</p>
        <p>T = fA1 v 9P; 9P
v A2g;</p>
        <p>A = fA1(a); A1(b)g;
and the query q() A2(y). If we evaluate this query under count semantics,
then certAns(q; K) = fhfg; 1ig (i.e., the only certain answer to q is the mapping
with empty domain fg, with multiplicity 1), because the following structure is a
model of K:
a
A1</p>
        <p>P</p>
        <p>
          P
u
A2
b
A1
However, such structure does not accurately represent a bag interpretation. Let
us now build a (minimal) bag interpretation I for K. To satisfy A, we set A1I (a) =
1 and A1I (b) = 1. Then to satisfy A1 v 9P , we introduce a single element u (as
above) and obtain P I (a; u) = 1 and P I (b; u) = 1. Therefore (P )I (u; a) = 1 and
(P )I (u; b) = 1, which, according to the semantics proposed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], imply that
(9P )I (u) = 2. Then in order for this bag interpretation to satisfy 9P v A2, it
must be the case that (A2)I (u) = 2. So the only certain answer to q over K under
bag semantics is the empty mapping with multiplicity 2, i.e. bagCertAns(q; K) =
fhfg; 2ig.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>It was also shown in [14] that query answering under bag and count semantics</title>
        <p>
          coincide for the DL that extends DL-Litecore with role inclusion, but disallows
concepts of the form 1 R on the LHS of axioms. which we denote here with
3 The notation used in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for certain answers under both bag and count semantics
is slightly different from ours. Let dist(q) = fx1; ::; xng. Then the certain answers to
q under bag semantics are represented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] as a partial function : (NC)n ! N+.
Instead, we use bagCertAns(q; K) = fhfx1 7! c1; ::; xn 7! cng; ki j (c1; ::; cn) = kg.
As for count semantics, let h!; ki 2 certAns(q; K). Then the definition provided in
Section 2 implies that there is no k0 6= k such that h!; k0i 2 certAns(q; K). Instead,
in [
          <xref ref-type="bibr" rid="ref11 ref14">11,14</xref>
          ], h!; k0i is considered a certain answer under count semantics for each
1 k0 k.
        </p>
        <sec id="sec-3-7-1">
          <title>DL-LitecHo69re . However, our focus is once again on logics that allow for number</title>
          <p>restrictions on the RHS of axioms (and thus can encode statistics). So a
natural question is whether this result still holds for DL-LitecHo69re extended with
– –
such axioms. Let DL-LitecHoN–re 69 denote this DL (equivalently, DL-LitecHoNre 69 is
the fragment of DL-LitecHoNre that disallows concepts of the form 1 R on the</p>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>LHS of axioms). To answer this question, the bag semantics proposed in [14]</title>
        <p>needs to be extended to concepts of the form n R. One way to do this is to
define ( n R)I (d1) = Pd22 I RI (d1; d2) =n, for any bag interpretation I and
d1 2 I , where “ =” denotes integer division. With this definition, the pr–oof
provided in [14, Proposition 85] can be immediately extended to DL-LitecHoNre 69:
–
Proposition 2. For any DL-LitecHoNre 69 KB K and CQ q</p>
        <p>certAns(q; K) = bagCertAns(q; K)</p>
      </sec>
      <sec id="sec-3-9">
        <title>Proof (sketch). The proof of Proposition 85 in [14] uses the fact that for</title>
        <p>any DL-LitecHo69re KB K and model I of K under count semantics, there is a
bag interpretation Ib that is a model of K under bag semantics, and verifies
ans(q; I) = bagAns(q; Ib) for any CQ q. This bag interpretation is defined for
M 2 NC [ NR by M Ib (t) = 1 if t 2 M I , and M Ib (t) = 0 otherwise. Now for
any axiom of the form A v n R, it can be verified that if AI ( n R)I , then
AIb (d) ( n R)Ib (d) holds, for any individual d. So if K is now a DL-LitecHoNre–69
KB and I a model of K, then Ib as defined above is still a model of K under bag
semantics.</p>
        <p>
          The proof of Proposition 85 in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] also uses the fact that for any DL-LitecHo69re
bag KB K and model I of K under bag semantics, there is an interpretation Is
that is a model of K under count semantics, and such that for any CQ q, mapping
! over dist(q) and k 2 N+ [ f+1g, if h!; ki 2 bagAns(q; I), then h!; k0i 2
ans(q; Is) for some k0 k. This interpretation is defined for M 2 NC [ NR by
t 2 M Is iff M I (t) 1. Again, it can be verified that Is is still a model of K if
–
K is a DL-LitecHoNre 69 KB.
        </p>
      </sec>
      <sec id="sec-3-10">
        <title>The rest follows from the original proof.</title>
        <p>tu
–</p>
        <p>
          Finally, in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], for DL-LitecNore and under count semantics, a query
rewriting algorithm was provided for rooted CQs into a variant of RA extended with
aggregation and arithmetic functions. As for the rewriting of [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the output
query does not depend on the ABox. However, such algorithm is mostly of
theoretical interest, and not well suited for implementation (see Section 4). Two
negative results were also provided in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], which further circumscribe the scope
of rewritability. Specifically, for DL-LitecHore , Count was shown to be P-hard
both for rooted CQs and for CQs whose body contains a single atom. This
implies that for this DL and these classes of queries, query answering under
count semantics is not rewritable into a target query language for which query
answering is easier than P.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Rewriting</title>
      <p>
        Query answering under count semantics was shown in [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ] to be rewritable for
–
rooted CQs and DL-LitecNore , into a target query language that extends RA with
aggregation and some algebraic functions, thanks to a query rewriting algorithm
inspired by PerfectRef [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, the size of the rewritten query may be
exponential in the size of the input TBox. More specifically, it may be exponential
both in the number of axioms (precisely, in the depth of the deepest concept
hierarchy that can be inferred from the TBox), and in the encoding of numbers
that appear in number restrictions (if encoded in binary, thus polynomial in the
value of such numbers). In many applications, it is reasonable to assume that the
number of axioms in the TBox remains limited, so the first source of exponential
blow-up may not be a major practical limitation. On the other hand, as
illustrated in Section 1, values that appear in number restrictions (such as the one in
20000 enrolledIn ) may depend on the size of the domain under consideration,
and thus, in some applications, they are likely to be of the same order of
magnitude as the size of the ABox itself. This might be unmanageable in practice,
in scenarios where we have to deal with large amounts of data.
      </p>
      <sec id="sec-4-1">
        <title>In the following, we describe how the rewriting algorithm of [4,5] can be</title>
        <p>
          adapted (for the same DL, source, and target query languages), so that the size
of the output query remains linear in the size of the (binary) encoding of numbers
in number restrictions. Moreover, this new rewriting guarantees that the number
of algebraic operators in the output query only depends on the number of axioms
of the TBox. The algorithm of [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ] exploits the so-called chase model Ican of
K
–
K. This model is such that, for DL-LitecNore and rooted CQs, certAns(q; K) and
ans(q; IcKan ) coincide.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Specifically, we illustrate how the rewriting algorithm from [4,5] can be mod</title>
        <p>
          ified by running it on the same example as the one used in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ]. Whenever
relevant, we will explicitly point out the differences between the two algorithms.
The rewriting from [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ] generates a set of queries that can be directly translated
into a SQL query. The target language for this rewriting makes use of nested
aggregation in the form of special “atoms” of the form 9=k:P (x; y), with k 2 N,
which intuitively correspond to a SQL COUNT DISTINCT operation, together with
a boolean condition stating that the result of the aggregation operation over y
must be equal to k. If the TBox contains an axiom the form C v n R, then for
each k 2 [1::n], the rewriting of [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ] generates one sub-query. Hence, the number
of generated sub-queries is linear in n, and exponential in the binary encoding
of n, which makes it unpractical.
        </p>
        <p>
          In our variant, we drop the boolean condition and use instead the
notation z = count(y):P (x; y), where z is now a variable storing the result of the
same aggregation operation. Like in [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], we use negation, multiplication, and
subtraction. In addition, our target language also requires the SQL function
greatest(x,y) (that returns the largest value between x and y), and the
aggregation operator SUM. We show through our running example that, despite our
modifications, the target language for the rewriting can still be translated into
SQL.
P2
        </p>
        <p>P2
P2</p>
        <p>P1
a
A</p>
        <p>P1
b</p>
        <p>P2
P2</p>
        <p>P2
d
e
hQ(x; cnt = count(y1; y2)); fq(x; y1; y2)
A(x); P1(x; y1); P2(y1; y2)gi</p>
        <sec id="sec-4-2-1">
          <title>Intuitively, such query corresponds to the following SQL query4</title>
          <p>SELECT x, COUNT ( DISTINCT y1 , y2) as cnt
FROM A, P1 , P2
WHERE A.x = P1.x A AND P1.y1 = P2.y1
GROUP BY x</p>
          <p>Then each rewriting step selects a query Q in Q, and extends Q with a set
of new queries, obtained by applying some rewriting rule to Q, until saturation
is reached. In the previous query, since variable y2 is unbound, we can apply
a rewriting step over atom P2(y1; y2) with respect to axiom 9P1 v 3 P2,
producing the query:
hQ(x; cnt = sum(num));
hQ(x; y1; num = greatest(0; 3 z));</p>
          <p>fq(x; y1) A(x); P1(x; y1); P1(_; y1)g ^ z = cnt(y2): P2(y1; y2)ii</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>This query corresponds to the following SQL query:</title>
        <p>SELECT x, SUM(num) as cnt
FROM ( SELECT x, y1 , greatest (0, 3-z) as num</p>
        <p>FROM ( SELECT x, y1</p>
        <p>FROM A, P1 , ( SELECT x as _, y1 FROM P1) as P1a)
WHERE A.x = P1.x AND P1.y1 = P1a.y1
) AS J1 ,
( SELECT y1 , COUNT ( DISTINCT y2) as z
FROM P2 GROUP BY y1
) AS J2
WHERE J1.y1 = J2.y1
4 Note that the COUNT DISTINCT operator of SQL does not allow for multiple arguments.</p>
        <p>However, the desired result can be achieved through an injective concatenation
operator, for instance making use of an additional fresh symbol.</p>
        <p>
          The interested reader might observe that such query fully captures the three
queries (one for each non-zero value of num) from [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ]. On such a query, one
can apply a variant of the “reduce” rule of PerfectRef [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], leading to the query:
hQ(x; cnt = sum(num));
hQ(x; y1; num = greatest(0; 3 z));
q(x; y1) A(x); P1(x; y1); P1(_; y1)
q(x; y1) A(x); P1(x; y1)
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>This query corresponds to the following SQL query:</title>
        <p>SELECT x, SUM(num) as cnt
FROM ( SELECT x, y1 , greatest (0,3-z) as num
FROM (( SELECT x, y1</p>
        <p>FROM A, P1 , ( SELECT x as _, y1 FROM P1) AS P1a
WHERE A.x = P1.x AND P1.y1 = P1a.y1
)
UNION
( SELECT x, y1 FROM A, P1 WHERE A.x = P1.x)
) AS J1 ,
( SELECT y1 , COUNT ( DISTINCT y2) as z
FROM P2
GROUP BY y1
) AS J2
WHERE J1.y1 = J2.y1
^ z = cnt(y2): P2(y1; y2)ii</p>
      </sec>
      <sec id="sec-4-5">
        <title>Again, in the rewriting from [4,5], this very step produces three different queries: one for each non-zero value of num.</title>
        <sec id="sec-4-5-1">
          <title>By applying another rewriting step over atom P1(x; y1) and with respect to</title>
          <p>axiom A v 2 P1, we obtain the following query:
hQ(x; cnt = sum(num));
hQ(x; num = (greatest(0; (2 z) 3);</p>
          <p>fq(x) A(x)g ^ z = cnt(y1): P1(x; y1)ii</p>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>This query corresponds to the following SQL query:</title>
        <p>SELECT x, SUM(num) as cnt
FROM ( SELECT x, greatest (0,2-z) * 3 as num
FROM ( SELECT x FROM A) AS J1 ,
( SELECT x, COUNT ( DISTINCT y1) AS z
FROM P1
GROUP BY x
) AS J2
WHERE J1.x = J2.x</p>
      </sec>
      <sec id="sec-4-7">
        <title>Let us now analyze the queries produced by the rewriting. The query after</title>
        <p>the initialization step returns the number of paths (x; y1; y2) in A conforming to
the structure dictated by the body of the input query. Since there are two such
paths, this query returns the answer fx 7! a; cnt 7! 2g. The query produced after
the first rewriting step checks for all sub-paths (x; y1) of (x; y1; y2) such that x
is an instance of A, y1 is a P1-successor of x, and y1 has less P2-successors in the
ABox than what the TBox prescribes. There is one such path in IcKan , namely
the one terminating in node b that has only two P2-successors in A. This path
is captured by our query, which returns as answer fx 7! a; cnt 7! 1g: indeed,
there is a single way of extending this path into the anonymous part. The last
query has to be interpreted in a similar way. The query retrieves the individual
a, since this node has only one P1-successor in A but should have at least two
P1-successors according to T . The answer to such query is fx 7! a; cnt 7! 3g.
Indeed, there are three ways of extending the match fx 7! ag into the anonymous
part. Now, a last aggregation (with SUM) over all queries in Q yields the desired
answer fx 7! a; cnt 7! 6g, which indeed corresponds to the answer hfx 7! ag; 6i
to our input query over IcKan .
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        In this paper, we have improved the query rewriting technique proposed in [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]
–
for rooted CQs under count semantics over a DL-LitecNore KB, into a target
query language that extends RA with aggregation and arithmetic functions. We
have illustrated how the exponential blow-up of the output query in the numbers
that occur in the TBox, characteristic of the rewriting of [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ], can be avoided by
extending the target rewriting language, specifically with the aggregate operator
sum and with the binary function greatest. We also show that this enriched
language can still be translated into SQL. Considering that numeric values that
appear in the TBox may in practice be of the same order of magnitude as the
size of the ABox, this new rewriting algorithm is a significant step towards a
practical implementation of query answering under this semantics.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <sec id="sec-6-1">
        <title>This research has been partially supported by the Wallenberg AI, Autonomous</title>
      </sec>
      <sec id="sec-6-2">
        <title>Systems and Software Program (WASP) funded by the Knut and Alice Wallen</title>
        <p>berg Foundation, by the Italian Basic Research (PRIN) project HOPE, by the</p>
      </sec>
      <sec id="sec-6-3">
        <title>EU H2020 project INODE, grant agreement 863410, by the CHIST-ERA project</title>
      </sec>
      <sec id="sec-6-4">
        <title>PACMEL, by the project IDEE (FESR1133) funded through the European Re</title>
        <p>gional Development Fund (ERDF) Investment for Growth and Jobs Programme
2014-2020, by the project “South Tyrol Longitudinal study on Longevity and</p>
      </sec>
      <sec id="sec-6-5">
        <title>Ageing” (STyLoLa), funded through the 2017 Interdisciplinary Call issued by</title>
        <p>the Research Committee of the Free University of Bozen-Bolzano, and by the
project “Ontology-based Geodata Integration, Visualization and Analysis”
(OntoGeo), funded through the 2018 CRC Call issued by the Research Committee
of the Free University of Bozen-Bolzano.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Artale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The DL-Lite family and relations</article-title>
          .
          <source>J. of Artificial Intelligence Research</source>
          ,
          <volume>36</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Bienvenu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          .
          <article-title>Ontology-mediated query answering with datatractable description logics</article-title>
          .
          <source>In Reasoning Web: Web Logic Rules - 11th Int. Summer School Tutorial Lectures (RW)</source>
          , volume
          <volume>9203</volume>
          <source>of LNCS</source>
          , pages
          <fpage>218</fpage>
          -
          <lpage>307</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          .
          <article-title>Counting query answers over a DL-Lite knowledge base</article-title>
          .
          <source>In Proc. of the 29th Int. Joint Conf. on Artificial Intelligence (IJCAI)</source>
          .
          <source>IJCAI Org.</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          .
          <article-title>Counting query answers over a DL-Lite knowledge base (Extended version)</article-title>
          .
          <source>CoRR Tech. Rep</source>
          . arXiv:
          <year>2005</year>
          .
          <volume>05886</volume>
          , arXiv.org e-Print archive,
          <year>2020</year>
          . Available at http://arxiv.org/ abs/
          <year>2005</year>
          .05886.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Conjunctive query containment and answering under description logics constraints</article-title>
          .
          <source>ACM Trans. on Comp. Logic</source>
          ,
          <volume>9</volume>
          (
          <issue>3</issue>
          ):
          <fpage>22</fpage>
          .
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .31,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nutt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Thorne</surname>
          </string-name>
          .
          <article-title>Aggregate queries over ontologies</article-title>
          .
          <source>In Proc. of the 2nd Int. Workshop on Ontologies and Inf. Systems for the Semantic Web (ONISW)</source>
          , pages
          <fpage>97</fpage>
          -
          <lpage>104</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mengel</surname>
          </string-name>
          .
          <article-title>Counting answers to existential positive queries: a complexity classification</article-title>
          .
          <source>In Proc. of the 35th ACM Symp. on Principles of Database Systems (PODS)</source>
          , pages
          <fpage>315</fpage>
          -
          <lpage>326</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Grumbach</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          .
          <article-title>Towards tractable algebras for bags</article-title>
          .
          <source>J. of Computer and System Sciences</source>
          ,
          <volume>52</volume>
          (
          <issue>3</issue>
          ):
          <fpage>570</fpage>
          -
          <lpage>588</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>E. V.</given-names>
            <surname>Kostylev</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          .
          <article-title>Complexity of answering counting aggregate queries over DL-Lite</article-title>
          .
          <source>J. of Web Semantics</source>
          ,
          <volume>33</volume>
          :
          <fpage>94</fpage>
          -
          <lpage>111</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Wong</surname>
          </string-name>
          .
          <article-title>Query languages for bags and aggregate functions</article-title>
          .
          <source>J. of Computer and System Sciences</source>
          ,
          <volume>55</volume>
          (
          <issue>2</issue>
          ):
          <fpage>241</fpage>
          -
          <lpage>272</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <article-title>OWL 2 Web Ontology Language profiles</article-title>
          (2nd ed.).
          <source>W3C Rec</source>
          .,
          <source>W3C</source>
          ,
          <year>2012</year>
          . http://www. w3.org/TR/owl2-profiles/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>C. Nikolaou</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          <string-name>
            <surname>Kostylev</surname>
            , G. Konstantinidis,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>B. Cuenca</given-names>
          </string-name>
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>and I. Horrocks.</given-names>
          </string-name>
          <article-title>Foundations of ontology-based data access under bag semantics</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>274</volume>
          :
          <fpage>91</fpage>
          -
          <lpage>132</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Tractable counting of the answers to conjunctive queries</article-title>
          .
          <source>J. of Computer and System Sciences</source>
          ,
          <volume>79</volume>
          (
          <issue>6</issue>
          ):
          <fpage>984</fpage>
          -
          <lpage>1001</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. G. Xiao,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Ontology-based data access: A survey</article-title>
          .
          <source>In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI)</source>
          , pages
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          . IJCAI Org.,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>