Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions? Diego Calvanese1,2 , Julien Corman1 , Davide Lanti1 , and Simon Razniewski3 1 Free University of Bozen-Bolzano, Italy, lastname @inf.unibz.it 2 Umeå University, Sweden, diego.calvanese@umu.se 3 Max-Planck-Institut für Informatik, Saarbrücken, Germany, srazniew@mpi-inf.mpg.de Abstract. We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated under count semantics over a DL-Lite knowledge base. The target query language is an extension of relational algebra with aggregation and arithmetic functions, which can be trans- lated into SQL. The algorithm supports number restrictions on the RHS of axioms in the input TBox, which can be used to encode statistics. The size of the output query remains linear in the binary encoding of these numbers, which improves upon previously proposed approaches. 1 Introduction Counting answers to a query is an essential operation in data management, and is supported by virtually every relational database management system (RDBMS). In this paper, we focus on counting answers over a knowledge base (KB), which may be viewed as a database (DB) enriched with background knowledge about the domain of interest. Our work falls within the scope of Ontology Mediated Query Answering (OMQA)1 [7,3], where the background knowledge takes the form of a set of logical statements, called TBox, and the data is represented as a set of facts, called ABox. TBoxes are in general expressed in Description Logics (DLs), which are decidable fragments of first-order (FO) logic. In this setting, counting answers to a query over a KB may require operations that go beyond counting ABox facts, as one also needs to take into account facts that can be inferred through the TBox. For instance, consider a scenario where a central repository keeps track of aggregated information about universities, such as their size, measured in number of students. And assume that these universities also keep detailed records about enrollment. These data sources may be integrated into a DL KB with a TBox of the form: Uni v ≥1000 enrolledIn− MedUni v ≥5000 enrolledIn− BigUni v ≥20000 enrolledIn− ? Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) 1 Also referred to as OBDA (for Ontology Based Data Access), when emphasis is placed on mappings connecting external data sources to a TBox [16]. 2 Calvanese et al. And an ABox of the form: Uni(UniBZ), BigUni(UW), BigUni(TUW), . . . , region(UniBZ, SouthTyrol), region(UW, Vienna), region(TUW, Vienna), . . . , enrolledIn(Alice, UniBZ), enrolledIn(Bob, UW), enrolledIn(Carol, TUW), . . . In practice, the information about enrollment in such KB may be incomplete. First, aggregated data about a whole university may be available, but not the detailed records. The opposite may also hold: detailed records may be provided by a university, but the central repository may not have an aggregated value for it. In such scenarios, even if the data is incomplete, one may still want to compute a lower bound on the number of enrollments of students within each region. In more formal terms, one might be interested in counting the number of answers, as in the following query: q(count(x, y)) ← enrolledIn(x, y) ∧ region(y, Vienna) Answering such query may require arithmetic operations that combine num- bers present in number restrictions in the TBox, and numbers obtained by count- ing ABox statements. For instance, in this example, assume that TUW and UW are the only two universities in the KB that are registered in the region of Vienna. Assume also that UW does not share enrollment records, whereas TUW shares them, however they may be incomplete (for instance, because some faculties within TUW have not yet communicated their information regarding enroll- ments). Then, the output value for count(x, y) should be the sum of: 20 000 on the one hand (for UW ), and the largest value between 20 000 and n on the other hand (for TUW ), where n is the number of enrollment records shared by TUW. Query answering over DL KBs has been extensively studied for boolean and enumeration queries. With respect to the focus of this paper, a relevant result [6] is that for conjunctive queries (CQs) and unions of CQs (UCQs), answering a query over a KB expressed in certain lightweight DLs can be performed by rewriting it, according to the axioms in the TBox, into a relational algebra (RA) query over the ABox. This property, called FO rewritability, allows one to use an RDBMS for the evaluation of the rewritten query, thus leveraging already mature optimization techniques implemented in such systems. Among these DLs, the DL-Lite family [6,1] has been widely studied and adopted in OMQA/OBDA systems, resulting in the OWL 2 QL standard [13]. In comparison, the problem of counting answers to a CQ over a DL-Lite KB has seen little interest in the literature. A seminal result was provided in [11], showing that the problem is significantly harder already for relatively inex- pressive DLs, with a shift from AC0 -membership to coNP-completeness in data complexity, i.e., assuming the sizes of the query and TBox to be fixed. Another key result was provided in [14], but for a slightly different semantics called bag semantics: for certain variants of DL-Lite, CQs that are rooted (i.e., with at least one constant or answer variable) can be rewritten into a variant of bag RA. However, this result is provided for DLs without number restrictions (like ≥1000 enrolledIn− in the example above). In addition, because bag semantics dif- fers from the count semantics studied in [11], this rewritability result does not Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions 3 directly apply to our setting. Finally, in [4,5], it was shown that under count semantics (and for some variants of DL-Lite), rooted2 CQs can be rewritten into a variant of RA with aggregation and arithmetic functions. However, the provided rewriting algorithm may yield a query whose size is polynomial in the values of the numbers that occur in the TBox, i.e., exponential in the binary representation of such numbers, which is impractical. In this paper, we improve upon this last result, showing how for the same DL, and under count semantics, rooted CQs can be rewritten into a different extension of RA, such that the number of algebraic operators in the output query does not depend on the numbers that occur in the TBox, but only on the number of axioms of the TBox. As a consequence, the size of the resulting query is polynomial in the numbers that appear in the number restrictions of the TBox, even when such numbers are represented in binary. This is an important step towards efficient query answering in such setting. 2 Preliminaries We assume mutually disjoint countably infinite sets NI of individuals (a.k.a. constants), NE of anonymous individuals (induced by existential quantification), NV of variables, NC of concept names (i.e., unary predicates, denoted with A), and NR of role names (i.e., binary predicates, denoted with P ). An atom is an expression of the form A(s) or P (s, t), with A ∈ NC , P ∈ NR , and s, t ∈ NI ∪ NE ∪ NV . In the following, boldface letters like t denote tuples, and we sometimes treat tuples as sets. If t = (t1 , .., tn ) is a tuple and f a function defined for each ti , we use f (t) to denote the tuple (f (t1 ), .., f (tn )). An interpretation I is a FO structure h∆I , ·I i, where the domain ∆I is a non-empty subset of NI ∪ NE , and the interpretation function ·I maps each constant c ∈ NI to itself (cI = c, or in other words, we adopt the standard names assumption), each concept name A ∈ NC to a set AI ⊆ ∆I , and each role name P ∈ NR to a binary relation P I ⊆ ∆I × ∆I . If I = h∆I , .I i is an interpretation, we may use I to denote the set of atoms {A(s) | A ∈ NC , s ∈ AI } ∪ {P (s, t) | P ∈ NR and (s, t) ∈ P I }. Conversely, if S is a set of unary and binary atoms with arguments in NI ∪ NE , we may view it as the interpretation h∆S , ·S i, where ∆S is the union of the arguments of all atoms in S, and ·S is defined by AS = {s | A(s) ∈ S}, for A ∈ NC , and P S = {(s, t) | P (s, t) ∈ S}, for P ∈ NR . A KB is a pair K = hT , Ai, where A, called ABox, is a finite set of atoms with arguments in NI , and T , called TBox, is a finite set of axioms. In this 2 Precisely, the result provided in [4,5] holds for rooted and connected CQs. This result immediately extends to rooted but disconnected CQs, by observing that the definition of rootedness requires each connected component to be rooted. Therefore, to answer such CQ, it suffices to compute the answers to each connected component separately, and then the cartesian product of these answers, multiplying the obtained cardinalities. 4 Calvanese et al. R −→ P | P − B −→ A | ≥1 R C −→ A | ¬A | ≥n R – Fig. 1. Syntax of DL-Lite HNcore roles R, basic concepts B, and concepts C, where n denotes a positive integer, i.e., n ∈ N+ . – paper, we focus on (fragments of) DL-Lite HN core [4,5], a member of the the DL- Lite family [1] where each axiom is a concept inclusion of the form B v C or a role inclusion of the form R1 v R2 , following the grammar of Figure 1. From now on, we use the symbols B, C, P and R in accordance with this grammar, and we call these elements respectively basic concepts, concepts, atomic roles and roles. Concepts of the form ≥n R are called number restrictions. The fragment of – DL-Lite HN core that disallows role inclusions and number restrictions where n > 1 is called DL-Lite core [1]. The evaluation C I (resp. RI ) of a concept C (resp. role R) over interpretation an I is defined inductively over the structure of C (resp. R) as usual (see [2] for a definition). An interpretation I is a model of hT , Ai iff A ⊆ I holds, and B I ⊆ C I (resp. R1I ⊆ R2I ) holds for each concept inclusion (resp. role inclusion) axiom B v C (resp. R1 v R2 ) in T . A KB is satisfiable iff it admits at least one model. For readability, in what follows we focus on satisfiable KBs, that is, we use “a KB” as a shortcut for “a satisfiable KB”. A conjunctive query (CQ) q is an expression of the form q(x) ← p1 (t1 ), . . . , pn (tn ), where x ⊆ NV , and each pi (ti ) is an atom with arguments in NV ∪ NI . In addition, we require safeness, i.e., x ⊆ t1 ∪ · · · ∪ tn . We use dist(q) to denote x, called the distinguished variables of q, and we use body(q) to denote {p1 (t1 ), . . . , pn (tn )}. The Gaifman graph Gq of q is the undirected graph whose vertices are the variables appearing in body(q), and that contains an edge between x1 and x2 iff P (x1 , x2 ) ∈ body(q) for some role P . Following [14], we call a CQ q rooted if each connected component in Gq contains at least one constant or distinguished variable. We adopt the semantics proposed in [11] for counting answers to a CQ over a KB, which we call count semantics. If f is a function and D a subset of its domain, then we use f |D to denote f restricted to D. A match for a CQ q over an interpretation I is a homomorphism from body(q) to I. Let M be the set of matches for q over I, let ω be a mapping from dist(q) to NI , and let Γ = {γ ∈ M | γ|dist(q) = ω}. Then the pair hω, |Γ |i is an answer to q over I iff |Γ | ≥ 1. And we use ans(q, I) to denote the set of answers to q over I. Finally, a pair hω, ki is a certain answer to q over a KB K (under count semantics) iff k is the largest element of N ∪ {+∞} such that, for each model I of K, hω, kI i ∈ ans(q, I) for some kI ≥ k. We use certAns(q, K) to denote the set of certain answers to q over K. A property that has been extensively studied in the OBDA/OMQA liter- ature is FO rewritability [6,1]. Query answering for a class Q of queries is FO rewritable for a DL L iff, for every L TBox T and Q-query q, there is a FO query q 0 such that, for every ABox A, the certain answers to q over hT , Ai and the answers to q 0 over A (viewed as an interpretation) coincide. In particular, under Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions 5 standard certain answer semantics for DLs (which differs from the definition of certAns(q, K) under count semantics given above), query answering for UCQs is FO rewritable for several members of the DL-Lite family. Since RA captures (domain-independent) FO logic, the evaluation of the query obtained via rewrit- ing can be delegated to a RDBMS. Then in [14], this notion of rewritability has been adapted to a different target query language, called BCALC, which cap- tures relational bag algebra [10], and can be translated into SQL queries with aggregation. 3 Related Work Query answering under count semantics over a relational DB can be viewed as a specific case of query answering under bag semantics, investigated notably in [10,12]. Instead, in our setting, and in line with the OMQA/OBDA literature, we assume that the input ABox is a set rather than a bag. The counting problem over sets has also been studied recently in the relational DB setting [15,9], but from the perspective of combined complexity, where the query is considered part of the input (i.e., its size is not assumed to be fixed). As for (DL-Lite) KBs, in [8] an alternative (epistemic) count semantics is defined, which counts over all grounded tuples (i.e., over NI ) entailed by the KB. Such a semantics does not account for existentially implied individuals, and thus cannot capture the statistics motivating our work. Closer to our concern is the work presented in [11], which introduces the count semantics for CQs over a DL KB adopted in this paper. In particular the Count problem, which is the decision variant of query answering under count se- mantics, was shown in [11] to be coNP-hard in data complexity for the relatively inexpressive DL DL-Lite core , even when negated concepts are forbidden. This implies that for this DL and arbitrary CQs, query answering under count seman- tics is not rewritable into a target query language for which query answering is easier than coNP in data complexity. Then, rewritability of CQs over a DL-Lite KB was investigated in [14] for the related problem of query answering under bag semantics. In particular, a rewriting algorithm was provided for rooted CQs into the language BCALC (which can be evaluated in TC0 ), and for a DL that extends DL-Lite core with a restricted form of role subsumption. This result does not immediately transfer to our setting though, for two reasons. First, this DL cannot express number restrictions on the right-hand side (RHS) of TBox axioms, and therefore cannot encode statistics like the ones from our motivating example. Second, as shown in [14], for DL-Lite core already, bag and count semantics differ in the presence of existential quantification on the left-hand side (LHS) of TBox axioms, which are typically used to express the domain and range of an atomic role. To understand this difference, we recall the basics of query answering under bag semantics (and refer to [14] for a complete definition.) A bag interpretation I maps each atomic concept A (resp. atomic role P ) to a bag AI (resp. P I ). Such bag can be seen as a function that maps each element of ∆I (resp. ∆I × ∆I ) 6 Calvanese et al. to the number of times it occurs in the bag. This function .I is then extended to complex concepts and roles inductively (see [14]). And I satisfies an axiom C1 v C2 (resp. R1 v R2 ) iff (C1 )I (d) ≤ (C2 )I (d) for every d ∈ ∆I (resp. (R1 )I (d1 , d2 ) ≤ (R2 )I (d1 , d2 ) for every (d1 , d2 ) ∈ ∆I × ∆I ). Now let q(x) ← p1 (t1 ), . . . , pn (tn ) be a CQ, let I be a bag interpretation, let V be the set of variables that appear in q, let ωPbe a Q mapping from x to NI , let Γ = {γ : V → ∆I | γ|dist(q) = ω}, and let k = γ∈Γ 1≤i≤n (pi )I (γ(ti )). Then hω, ki is a bag answer to q over I iff k ≥ 1. We use bagAns(q, I) to denote the set of bag answers to q over I. And if K is a KB, we use bagCertAns(q, K) to denote the certain answers to q over K under bag semantics, defined analogously to certain answers under count semantics.3 The difference between bag and count semantics is illustrated in [14] with the following example: Example 1. Consider the KB K = hT , Ai, where T = {A1 v ∃P, ∃P − v A2 }, A = {A1 (a), A1 (b)}, and the query q() ← A2 (y). If we evaluate this query under count semantics, then certAns(q, K) = {h{}, 1i} (i.e., the only certain answer to q is the mapping with empty domain {}, with multiplicity 1), because the following structure is a model of K: a P u P b A1 A2 A1 However, such structure does not accurately represent a bag interpretation. Let us now build a (minimal) bag interpretation I for K. To satisfy A, we set AI1 (a) = 1 and AI1 (b) = 1. Then to satisfy A1 v ∃P , we introduce a single element u (as above) and obtain P I (a, u) = 1 and P I (b, u) = 1. Therefore (P − )I (u, a) = 1 and (P − )I (u, b) = 1, which, according to the semantics proposed in [14], imply that (∃P − )I (u) = 2. Then in order for this bag interpretation to satisfy ∃P − v A2 , it must be the case that (A2 )I (u) = 2. So the only certain answer to q over K under bag semantics is the empty mapping with multiplicity 2, i.e. bagCertAns(q, K) = {h{}, 2i}. It was also shown in [14] that query answering under bag and count semantics coincide for the DL that extends DL-Litecore with role inclusion, but disallows concepts of the form ≥1 R on the LHS of axioms. which we denote here with 3 The notation used in [14] for certain answers under both bag and count semantics is slightly different from ours. Let dist(q) = {x1 , .., xn }. Then the certain answers to q under bag semantics are represented in [14] as a partial function µ : (NC )n → N+ . Instead, we use bagCertAns(q, K) = {h{x1 7→ c1 , .., xn 7→ cn }, ki | µ(c1 , .., cn ) = k}. As for count semantics, let hω, ki ∈ certAns(q, K). Then the definition provided in Section 2 implies that there is no k0 6= k such that hω, k0 i ∈ certAns(q, K). Instead, in [11,14], hω, k0 i is considered a certain answer under count semantics for each 1 ≤ k0 ≤ k. Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions 7 DL-Lite H6 ∃ core . However, our focus is once again on logics that allow for number restrictions on the RHS of axioms (and thus can encode statistics). So a nat- ural question is whether this result still holds for DL-Lite H6 ∃ core extended –with HN – 6∃ such axioms. Let DL-Litecore denote this DL (equivalently, DL-LiteHN core 6∃ is HN – the fragment of DL-Litecore that disallows concepts of the form ≥1 R on the LHS of axioms). To answer this question, the bag semantics proposed in [14] needs to be extended to P conceptsI of the form ≥n R. One way to do this is to define (≥n R)I (d1 ) = d2 ∈∆I R (d1 , d2 ) /n, for any bag interpretation I and I d1 ∈ ∆ , where “/” denotes integer division. With this definition, the proof – provided in [14, Proposition 85] can be immediately extended to DL-LiteHN core : 6∃ – Proposition 2. For any DL-LiteHN core 6∃ KB K and CQ q certAns(q, K) = bagCertAns(q, K) Proof (sketch). The proof of Proposition 85 in [14] uses the fact that for any DL-Lite H6 ∃ core KB K and model I of K under count semantics, there is a bag interpretation Ib that is a model of K under bag semantics, and verifies ans(q, I) = bagAns(q, Ib ) for any CQ q. This bag interpretation is defined for M ∈ NC ∪ NR by M Ib (t) = 1 if t ∈ M I , and M Ib (t) = 0 otherwise. Now for any axiom of the form A v ≥n R, it can be verified that if AI ⊆ (≥n R)I , then – AIb (d) ≤ (≥n R)Ib (d) holds, for any individual d. So if K is now a DL-LiteHNcore 6∃ KB and I a model of K, then Ib as defined above is still a model of K under bag semantics. The proof of Proposition 85 in [14] also uses the fact that for any DL-Lite H6 ∃ core bag KB K and model I of K under bag semantics, there is an interpretation Is that is a model of K under count semantics, and such that for any CQ q, mapping ω over dist(q) and k ∈ N+ ∪ {+∞}, if hω, ki ∈ bagAns(q, I), then hω, k 0 i ∈ ans(q, Is ) for some k 0 ≤ k. This interpretation is defined for M ∈ NC ∪ NR by t ∈ M Is iff M I (t) ≥ 1. Again, it can be verified that Is is still a model of K if – K is a DL-LiteHN core 6∃ KB. The rest follows from the original proof. t u – Finally, in [4,5], for DL-Lite N core and under count semantics, a query rewrit- ing algorithm was provided for rooted CQs into a variant of RA extended with aggregation and arithmetic functions. As for the rewriting of [14], the output query does not depend on the ABox. However, such algorithm is mostly of the- oretical interest, and not well suited for implementation (see Section 4). Two negative results were also provided in [4,5], which further circumscribe the scope of rewritability. Specifically, for DL-Lite H core , Count was shown to be P-hard both for rooted CQs and for CQs whose body contains a single atom. This implies that for this DL and these classes of queries, query answering under count semantics is not rewritable into a target query language for which query answering is easier than P. 8 Calvanese et al. 4 Rewriting Query answering under count semantics was shown in [4,5] to be rewritable for – rooted CQs and DL-LiteN core , into a target query language that extends RA with aggregation and some algebraic functions, thanks to a query rewriting algorithm inspired by PerfectRef [6]. However, the size of the rewritten query may be ex- ponential in the size of the input TBox. More specifically, it may be exponential both in the number of axioms (precisely, in the depth of the deepest concept hierarchy that can be inferred from the TBox), and in the encoding of numbers that appear in number restrictions (if encoded in binary, thus polynomial in the value of such numbers). In many applications, it is reasonable to assume that the number of axioms in the TBox remains limited, so the first source of exponential blow-up may not be a major practical limitation. On the other hand, as illus- trated in Section 1, values that appear in number restrictions (such as the one in ≥20000 enrolledIn− ) may depend on the size of the domain under consideration, and thus, in some applications, they are likely to be of the same order of mag- nitude as the size of the ABox itself. This might be unmanageable in practice, in scenarios where we have to deal with large amounts of data. In the following, we describe how the rewriting algorithm of [4,5] can be adapted (for the same DL, source, and target query languages), so that the size of the output query remains linear in the size of the (binary) encoding of numbers in number restrictions. Moreover, this new rewriting guarantees that the number of algebraic operators in the output query only depends on the number of axioms K of the TBox. The algorithm of [4,5] exploits the so-called chase model Ican of N– K. This model is such that, for DL-Litecore and rooted CQs, certAns(q, K) and K ans(q, Ican ) coincide. Specifically, we illustrate how the rewriting algorithm from [4,5] can be mod- ified by running it on the same example as the one used in [4,5]. Whenever relevant, we will explicitly point out the differences between the two algorithms. The rewriting from [4,5] generates a set of queries that can be directly translated into a SQL query. The target language for this rewriting makes use of nested aggregation in the form of special “atoms” of the form ∃=k .P (x, y), with k ∈ N, which intuitively correspond to a SQL COUNT DISTINCT operation, together with a boolean condition stating that the result of the aggregation operation over y must be equal to k. If the TBox contains an axiom the form C v ≥n R, then for each k ∈ [1..n], the rewriting of [4,5] generates one sub-query. Hence, the number of generated sub-queries is linear in n, and exponential in the binary encoding of n, which makes it unpractical. In our variant, we drop the boolean condition and use instead the nota- tion z = count(y).P (x, y), where z is now a variable storing the result of the same aggregation operation. Like in [4,5], we use negation, multiplication, and subtraction. In addition, our target language also requires the SQL function greatest(x,y) (that returns the largest value between x and y), and the aggre- gation operator SUM. We show through our running example that, despite our modifications, the target language for the rewriting can still be translated into SQL. Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions 9 a P2 P1 P1 P2 d P2 A b P2 e P2 P2 Fig. 2. Chase model of our running example. Solid arrows represent the information in the ABox, whereas dashed lines represent information implied by the KB. Consider the following KB K = hT , Ai and CQ q:     A v ≥ 2 P1 , A(a), P1 (a, b), T = A = ∃P1− v ≥3 P2 P2 (b, d), P2 (b, e) q(x) ← A(x), P1 (x, y1 ), P2 (y1 , y2 ) K The chase model Ican of K is represented in Figure 2. The rewriting algorithm operates on a set Q of queries. This set is initialized with a syntactic variant of the input query: hQ(x, cnt = count(y1 , y2 )), {q(x, y1 , y2 ) ← A(x), P1 (x, y1 ), P2 (y1 , y2 )}i Intuitively, such query corresponds to the following SQL query4 SELECT x , COUNT ( DISTINCT y1 , y2 ) as cnt FROM A , P1 , P2 WHERE A . x = P1 . x A AND P1 . y1 = P2 . y1 GROUP BY x Then each rewriting step selects a query Q in Q, and extends Q with a set of new queries, obtained by applying some rewriting rule to Q, until saturation is reached. In the previous query, since variable y2 is unbound, we can apply a rewriting step over atom P2 (y1 , y2 ) with respect to axiom ∃P1− v ≥3 P2 , producing the query: hQ(x, cnt = sum(num)), hQ(x, y1 , num = greatest(0, 3 − z)), {q(x, y1 ) ← A(x), P1 (x, y1 ), P1 (_, y1 )} ∧ z = cnt(y2 ). P2 (y1 , y2 )ii This query corresponds to the following SQL query: SELECT x , SUM ( num ) as cnt FROM ( SELECT x , y1 , greatest (0 , 3 - z ) as num FROM ( SELECT x , y1 FROM A , P1 , ( SELECT x as _ , y1 FROM P1 ) as P1a ) WHERE A . x = P1 . x AND P1 . y1 = P1a . y1 ) AS J1 , ( SELECT y1 , COUNT ( DISTINCT y2 ) as z FROM P2 GROUP BY y1 ) AS J2 WHERE J1 . y1 = J2 . y1 ) GROUP BY x 4 Note that the COUNT DISTINCT operator of SQL does not allow for multiple arguments. However, the desired result can be achieved through an injective concatenation op- erator, for instance making use of an additional fresh symbol. 10 Calvanese et al. The interested reader might observe that such query fully captures the three queries (one for each non-zero value of num) from [4,5]. On such a query, one can apply a variant of the “reduce” rule of PerfectRef [6], leading to the query: hQ(x, cnt = sum(num)), hQ(x,  y1 , num = greatest(0, 3 − z)),  q(x, y1 ) ← A(x), P1 (x, y1 ), P1 (_, y1 ) ∧ z = cnt(y2 ). P2 (y1 , y2 )ii q(x, y1 ) ← A(x), P1 (x, y1 ) This query corresponds to the following SQL query: SELECT x , SUM ( num ) as cnt FROM ( SELECT x , y1 , greatest (0 ,3 - z ) as num FROM (( SELECT x , y1 FROM A , P1 , ( SELECT x as _ , y1 FROM P1 ) AS P1a WHERE A . x = P1 . x AND P1 . y1 = P1a . y1 ) UNION ( SELECT x , y1 FROM A , P1 WHERE A . x = P1 . x ) ) AS J1 , ( SELECT y1 , COUNT ( DISTINCT y2 ) as z FROM P2 GROUP BY y1 ) AS J2 WHERE J1 . y1 = J2 . y1 ) GROUP BY x Again, in the rewriting from [4,5], this very step produces three different queries: one for each non-zero value of num. By applying another rewriting step over atom P1 (x, y1 ) and with respect to axiom A v ≥2 P1 , we obtain the following query: hQ(x, cnt = sum(num)), hQ(x, num = (greatest(0, (2 − z) · 3), {q(x) ← A(x)} ∧ z = cnt(y1 ). P1 (x, y1 )ii This query corresponds to the following SQL query: SELECT x , SUM ( num ) as cnt FROM ( SELECT x , greatest (0 ,2 - z ) * 3 as num FROM ( SELECT x FROM A ) AS J1 , ( SELECT x , COUNT ( DISTINCT y1 ) AS z FROM P1 GROUP BY x ) AS J2 WHERE J1 . x = J2 . x ) GROUP BY x Let us now analyze the queries produced by the rewriting. The query after the initialization step returns the number of paths (x, y1 , y2 ) in A conforming to the structure dictated by the body of the input query. Since there are two such paths, this query returns the answer {x 7→ a, cnt 7→ 2}. The query produced after the first rewriting step checks for all sub-paths (x, y1 ) of (x, y1 , y2 ) such that x is an instance of A, y1 is a P1 -successor of x, and y1 has less P2 -successors in the K ABox than what the TBox prescribes. There is one such path in Ican , namely Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions 11 the one terminating in node b that has only two P2 -successors in A. This path is captured by our query, which returns as answer {x 7→ a, cnt 7→ 1}: indeed, there is a single way of extending this path into the anonymous part. The last query has to be interpreted in a similar way. The query retrieves the individual a, since this node has only one P1 -successor in A but should have at least two P1 -successors according to T . The answer to such query is {x 7→ a, cnt 7→ 3}. Indeed, there are three ways of extending the match {x 7→ a} into the anonymous part. Now, a last aggregation (with SUM) over all queries in Q yields the desired answer {x 7→ a, cnt 7→ 6}, which indeed corresponds to the answer h{x 7→ a}, 6i K to our input query over Ican . 5 Conclusions In this paper, we have improved the query rewriting technique proposed in [4,5] – for rooted CQs under count semantics over a DL-LiteN core KB, into a target query language that extends RA with aggregation and arithmetic functions. We have illustrated how the exponential blow-up of the output query in the numbers that occur in the TBox, characteristic of the rewriting of [4,5], can be avoided by extending the target rewriting language, specifically with the aggregate operator sum and with the binary function greatest. We also show that this enriched language can still be translated into SQL. Considering that numeric values that appear in the TBox may in practice be of the same order of magnitude as the size of the ABox, this new rewriting algorithm is a significant step towards a practical implementation of query answering under this semantics. Acknowledgements This research has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallen- berg Foundation, by the Italian Basic Research (PRIN) project HOPE, by the EU H2020 project INODE, grant agreement 863410, by the CHIST-ERA project PACMEL, by the project IDEE (FESR1133) funded through the European Re- gional Development Fund (ERDF) Investment for Growth and Jobs Programme 2014-2020, by the project “South Tyrol Longitudinal study on Longevity and Ageing” (STyLoLa), funded through the 2017 Interdisciplinary Call issued by the Research Committee of the Free University of Bozen-Bolzano, and by the project “Ontology-based Geodata Integration, Visualization and Analysis” (On- toGeo), funded through the 2018 CRC Call issued by the Research Committee of the Free University of Bozen-Bolzano. 12 Calvanese et al. References 1. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, ed- itors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. 3. M. Bienvenu and M. Ortiz. Ontology-mediated query answering with data- tractable description logics. In Reasoning Web: Web Logic Rules – 11th Int. Summer School Tutorial Lectures (RW), volume 9203 of LNCS, pages 218–307. Springer, 2015. 4. D. Calvanese, J. Corman, D. Lanti, and S. Razniewski. Counting query answers over a DL-Lite knowledge base. In Proc. of the 29th Int. Joint Conf. on Artificial Intelligence (IJCAI). IJCAI Org., 2020. 5. D. Calvanese, J. Corman, D. Lanti, and S. Razniewski. Counting query an- swers over a DL-Lite knowledge base (Extended version). CoRR Tech. Rep. arXiv:2005.05886, arXiv.org e-Print archive, 2020. Available at http://arxiv.org/ abs/2005.05886. 6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 7. D. Calvanese, G. De Giacomo, and M. Lenzerini. Conjunctive query containment and answering under description logics constraints. ACM Trans. on Comp. Logic, 9(3):22.1–22.31, 2008. 8. D. Calvanese, E. Kharlamov, W. Nutt, and C. Thorne. Aggregate queries over ontologies. In Proc. of the 2nd Int. Workshop on Ontologies and Inf. Systems for the Semantic Web (ONISW), pages 97–104, 2008. 9. H. Chen and S. Mengel. Counting answers to existential positive queries: a com- plexity classification. In Proc. of the 35th ACM Symp. on Principles of Database Systems (PODS), pages 315–326, 2016. 10. S. Grumbach and T. Milo. Towards tractable algebras for bags. J. of Computer and System Sciences, 52(3):570–588, 1996. 11. E. V. Kostylev and J. L. Reutter. Complexity of answering counting aggregate queries over DL-Lite. J. of Web Semantics, 33:94–111, 2015. 12. L. Libkin and L. Wong. Query languages for bags and aggregate functions. J. of Computer and System Sciences, 55(2):241–272, 1997. 13. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web Ontology Language profiles (2nd ed.). W3C Rec., W3C, 2012. http://www. w3.org/TR/owl2-profiles/. 14. C. Nikolaou, E. V. Kostylev, G. Konstantinidis, M. Kaminski, B. Cuenca Grau, and I. Horrocks. Foundations of ontology-based data access under bag semantics. Artificial Intelligence, 274:91–132, 2019. 15. R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. J. of Computer and System Sciences, 79(6):984–1001, 2013. 16. G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, and M. Za- kharyaschev. Ontology-based data access: A survey. In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 5511–5519. IJCAI Org., 2018.