Introduction

Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions?

Diego Calvanese

diego.calvanese@umu.se 0 2

Julien Corman

Davide Lanti

Simon Razniewski

1 0 Free University of Bozen-Bolzano , Italy 1 Max-Planck-Institut für Informatik , Saarbrücken , Germany 2 Umeå University , Sweden

We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated under count semantics over a DL-Lite knowledge base. The target query language is an extension of relational algebra with aggregation and arithmetic functions, which can be translated into SQL. The algorithm supports number restrictions on the RHS of axioms in the input TBox, which can be used to encode statistics. The size of the output query remains linear in the binary encoding of these numbers, which improves upon previously proposed approaches.

Introduction

Commons License Attribution 4.0 International (CC BY 4.0) 1 Also referred to as OBDA (for Ontology Based Data Access), when emphasis is placed on mappings connecting external data sources to a TBox [ 16 ].

And an ABox of the form:

Uni(UniBZ); BigUni(UW); BigUni(TUW); : : : ; region(UniBZ; SouthTyrol); region(UW; Vienna); region(TUW; Vienna); : : : ; enrolledIn(Alice; UniBZ); enrolledIn(Bob; UW); enrolledIn(Carol; TUW); : : :

In practice, the information about enrollment in such KB may be incomplete.

First, aggregated data about a whole university may be available, but not the detailed records. The opposite may also hold: detailed records may be provided by a university, but the central repository may not have an aggregated value for it. In such scenarios, even if the data is incomplete, one may still want to compute a lower bound on the number of enrollments of students within each region. In more formal terms, one might be interested in counting the number of answers, as in the following query: q(count(x; y))

enrolledIn(x; y) ^ region(y; Vienna)

Answering such query may require arithmetic operations that combine num

bers present in number restrictions in the TBox, and numbers obtained by counting ABox statements. For instance, in this example, assume that TUW and UW are the only two universities in the KB that are registered in the region of Vienna.

Assume also that UW does not share enrollment records, whereas TUW shares

them, however they may be incomplete (for instance, because some faculties within TUW have not yet communicated their information regarding enrollments). Then, the output value for count(x; y) should be the sum of: 20 000 on the one hand (for UW ), and the largest value between 20 000 and n on the other hand (for TUW ), where n is the number of enrollment records shared by TUW.

Query answering over DL KBs has been extensively studied for boolean and enumeration queries. With respect to the focus of this paper, a relevant result [ 6 ] is that for conjunctive queries (CQs) and unions of CQs (UCQs), answering a query over a KB expressed in certain lightweight DLs can be performed by rewriting it, according to the axioms in the TBox, into a relational algebra (RA) query over the ABox. This property, called FO rewritability, allows one to use an RDBMS for the evaluation of the rewritten query, thus leveraging already mature optimization techniques implemented in such systems. Among these DLs, the DL-Lite family [ 6,1 ] has been widely studied and adopted in

OMQA/OBDA systems, resulting in the OWL 2 QL standard [13]. In comparison, the problem of counting answers to a CQ over a DL-Lite

KB has seen little interest in the literature. A seminal result was provided in [ 11 ], showing that the problem is significantly harder already for relatively inexpressive DLs, with a shift from AC0-membership to coNP-completeness in data complexity, i.e., assuming the sizes of the query and TBox to be fixed. Another key result was provided in [ 14 ], but for a slightly different semantics called bag semantics : for certain variants of DL-Lite, CQs that are rooted (i.e., with at least one constant or answer variable) can be rewritten into a variant of bag

RA. However, this result is provided for DLs without number restrictions (like

1000 enrolledIn in the example above). In addition, because bag semantics differs from the count semantics studied in [ 11 ], this rewritability result does not directly apply to our setting. Finally, in [ 4,5 ], it was shown that under count semantics (and for some variants of DL-Lite), rooted2 CQs can be rewritten into a variant of RA with aggregation and arithmetic functions. However, the provided rewriting algorithm may yield a query whose size is polynomial in the values of the numbers that occur in the TBox, i.e., exponential in the binary representation of such numbers, which is impractical.

In this paper, we improve upon this last result, showing how for the same DL, and under count semantics, rooted CQs can be rewritten into a different

extension of RA, such that the number of algebraic operators in the output query does not depend on the numbers that occur in the TBox, but only on the number of axioms of the TBox. As a consequence, the size of the resulting query is polynomial in the numbers that appear in the number restrictions of the

TBox, even when such numbers are represented in binary. This is an important step towards efficient query answering in such setting.

Preliminaries We assume mutually disjoint countably infinite sets NI of individuals (a.k.a.

constants ), NE of anonymous individuals (induced by existential quantification),

NV of variables, NC of concept names (i.e., unary predicates, denoted with A),

and NR of role names (i.e., binary predicates, denoted with P ). An atom is an expression of the form A(s) or P (s; t), with A 2 NC, P 2 NR, and s; t 2 NI [ NE [ NV. In the following, boldface letters like t denote tuples, and we sometimes treat tuples as sets. If t = (t1; ::; tn) is a tuple and f a function defined for each ti, we use f (t) to denote the tuple (f (t1); ::; f (tn)).

An interpretation I is a FO structure h I ; I i, where the domain I is a non-empty subset of NI [ NE, and the interpretation function I maps each constant c 2 NI to itself (cI = c, or in other words, we adopt the standard names assumption), each concept name A 2 NC to a set AI I , and each role name P 2 NR to a binary relation P I I I .

If I = h I ; :I i is an interpretation, we may use I to denote the set of atoms fA(s) j A 2 NC; s 2 AI g [ fP (s; t) j P 2 NR and (s; t) 2 P I g. Conversely, if S is a set of unary and binary atoms with arguments in NI [ NE, we may view it as the interpretation h S ; S i, where S is the union of the arguments of all atoms in S, and S is defined by AS = fs j A(s) 2 Sg, for A 2 NC, and P S = f(s; t) j P (s; t) 2 Sg, for P 2 NR.

A KB is a pair K = hT ; Ai, where A, called ABox, is a finite set of atoms with arguments in NI, and T , called TBox, is a finite set of axioms. In this 2 Precisely, the result provided in [ 4,5 ] holds for rooted and connected CQs. This result immediately extends to rooted but disconnected CQs, by observing that the definition of rootedness requires each connected component to be rooted. Therefore, to answer such CQ, it suffices to compute the answers to each connected component separately, and then the cartesian product of these answers, multiplying the obtained cardinalities.

R ! P j P

B ! A j paper, we focus on (fragments of) DL-LitecHoNre– [ 4,5 ], a member of the the DLLite family [ 1 ] where each axiom is a concept inclusion of the form B v C or a role inclusion of the form R1 v R2, following the grammar of Figure 1. From now on, we use the symbols B, C, P and R in accordance with this grammar, and we call these elements respectively basic concepts, concepts, atomic roles and roles. Concepts of the form n R are called number restrictions. The fragment of DL-LitecHoNre– that disallows role inclusions and number restrictions where n > 1 is called DL-Litecore [ 1 ].

The evaluation CI (resp. RI ) of a concept C (resp. role R) over interpretation

an I is defined inductively over the structure of C (resp. R) as usual (see [ 2 ] for a definition). An interpretation I is a model of hT ; Ai iff A I holds, and

BI CI (resp. R1I R2I ) holds for each concept inclusion (resp. role inclusion)

axiom B v C (resp. R1 v R2) in T . A KB is satisfiable iff it admits at least one model. For readability, in what follows we focus on satisfiable KBs, that is, we use “a KB” as a shortcut for “a satisfiable KB”.

A conjunctive query (CQ) q is an expression of the form

q(x) p1(t1); : : : ; pn(tn), where x NV, and each pi(ti) is an atom with arguments in NV [ NI. In addition, we require safeness, i.e., x t1 [ [ tn.

We use dist(q) to denote x, called the distinguished variables of q, and we

use body(q) to denote fp1(t1); : : : ; pn(tn)g. The Gaifman graph Gq of q is the undirected graph whose vertices are the variables appearing in body(q), and that contains an edge between x1 and x2 iff P (x1; x2) 2 body(q) for some role P . Following [ 14 ], we call a CQ q rooted if each connected component in Gq contains at least one constant or distinguished variable.

We adopt the semantics proposed in [ 11 ] for counting answers to a CQ over a KB, which we call count semantics. If f is a function and D a subset of its domain, then we use f jD to denote f restricted to D. A match for a CQ q over an interpretation I is a homomorphism from body(q) to I. Let M be the set of matches for q over I, let ! be a mapping from dist(q) to NI, and let = f 2 M j jdist(q) = !g. Then the pair h!; j ji is an answer to q over I iff j j 1. And we use ans(q; I) to denote the set of answers to q over I. Finally, a pair h!; ki is a certain answer to q over a KB K (under count semantics ) iff k is the largest element of N [ f+1g such that, for each model I of K, h!; kI i 2 ans(q; I) for some kI k. We use certAns(q; K) to denote the set of certain answers to q over K.

A property that has been extensively studied in the OBDA/OMQA liter

ature is FO rewritability [ 6,1 ]. Query answering for a class Q of queries is FO rewritable for a DL L iff, for every L TBox T and Q-query q, there is a FO query q0 such that, for every ABox A, the certain answers to q over hT ; Ai and the answers to q0 over A (viewed as an interpretation) coincide. In particular, under standard certain answer semantics for DLs (which differs from the definition of certAns(q; K) under count semantics given above), query answering for UCQs is FO rewritable for several members of the DL-Lite family. Since RA captures (domain-independent) FO logic, the evaluation of the query obtained via rewriting can be delegated to a RDBMS. Then in [ 14 ], this notion of rewritability has been adapted to a different target query language, called BCALC, which captures relational bag algebra [ 10 ], and can be translated into SQL queries with aggregation. 3

Related Work

Query answering under count semantics over a relational DB can be viewed as a specific case of query answering under bag semantics, investigated notably in [ 10,12 ]. Instead, in our setting, and in line with the OMQA/OBDA literature, we assume that the input ABox is a set rather than a bag. The counting problem over sets has also been studied recently in the relational DB setting [ 15,9 ], but from the perspective of combined complexity, where the query is considered part of the input (i.e., its size is not assumed to be fixed).

As for (DL-Lite) KBs, in [8] an alternative (epistemic) count semantics is

defined, which counts over all grounded tuples (i.e., over NI) entailed by the KB.

Such a semantics does not account for existentially implied individuals, and thus cannot capture the statistics motivating our work. Closer to our concern is the work presented in [11], which introduces the

count semantics for CQs over a DL KB adopted in this paper. In particular the

Count problem, which is the decision variant of query answering under count se

mantics, was shown in [ 11 ] to be coNP-hard in data complexity for the relatively inexpressive DL DL-Litecore , even when negated concepts are forbidden. This implies that for this DL and arbitrary CQs, query answering under count semantics is not rewritable into a target query language for which query answering is easier than coNP in data complexity.

Then, rewritability of CQs over a DL-Lite KB was investigated in [ 14 ] for the related problem of query answering under bag semantics. In particular, a rewriting algorithm was provided for rooted CQs into the language BCALC (which can be evaluated in TC0), and for a DL that extends DL-Litecore with a restricted form of role subsumption. This result does not immediately transfer to our setting though, for two reasons. First, this DL cannot express number restrictions on the right-hand side (RHS) of TBox axioms, and therefore cannot encode statistics like the ones from our motivating example. Second, as shown in [ 14 ], for DL-Litecore already, bag and count semantics differ in the presence of existential quantification on the left-hand side (LHS) of TBox axioms, which are typically used to express the domain and range of an atomic role.

To understand this difference, we recall the basics of query answering under

bag semantics (and refer to [ 14 ] for a complete definition.) A bag interpretation I maps each atomic concept A (resp. atomic role P ) to a bag AI (resp. P I ). Such bag can be seen as a function that maps each element of I (resp. I I ) to the number of times it occurs in the bag. This function :I is then extended to complex concepts and roles inductively (see [ 14 ]). And I satisfies an axiom C1 v C2 (resp. R1 v R2) iff (C1)I (d) (C2)I (d) for every d 2 I (resp. (R1)I (d1; d2) (R2)I (d1; d2) for every (d1; d2) 2 I I ).

Now let q(x) p1(t1); : : : ; pn(tn) be a CQ, let I be a bag interpretation, let

V be the set of variables that appear in q, let ! be a mapping from x to NI, let

= f : V ! I j jdist(q) = !g, and let k = P 2 Q1 i n(pi)I ( (ti)). Then h!; ki is a bag answer to q over I iff k 1. We use bagAns(q; I) to denote the set of bag answers to q over I. And if K is a KB, we use bagCertAns(q; K) to denote the certain answers to q over K under bag semantics, defined analogously to certain answers under count semantics.3

The difference between bag and count semantics is illustrated in [14] with the following example:

Example 1. Consider the KB K = hT ; Ai, where

T = fA1 v 9P; 9P v A2g;

A = fA1(a); A1(b)g; and the query q() A2(y). If we evaluate this query under count semantics, then certAns(q; K) = fhfg; 1ig (i.e., the only certain answer to q is the mapping with empty domain fg, with multiplicity 1), because the following structure is a model of K: a A1

P u A2 b A1 However, such structure does not accurately represent a bag interpretation. Let us now build a (minimal) bag interpretation I for K. To satisfy A, we set A1I (a) = 1 and A1I (b) = 1. Then to satisfy A1 v 9P , we introduce a single element u (as above) and obtain P I (a; u) = 1 and P I (b; u) = 1. Therefore (P )I (u; a) = 1 and (P )I (u; b) = 1, which, according to the semantics proposed in [ 14 ], imply that (9P )I (u) = 2. Then in order for this bag interpretation to satisfy 9P v A2, it must be the case that (A2)I (u) = 2. So the only certain answer to q over K under bag semantics is the empty mapping with multiplicity 2, i.e. bagCertAns(q; K) = fhfg; 2ig.

It was also shown in [14] that query answering under bag and count semantics

coincide for the DL that extends DL-Litecore with role inclusion, but disallows concepts of the form 1 R on the LHS of axioms. which we denote here with 3 The notation used in [ 14 ] for certain answers under both bag and count semantics is slightly different from ours. Let dist(q) = fx1; ::; xng. Then the certain answers to q under bag semantics are represented in [ 14 ] as a partial function : (NC)n ! N+. Instead, we use bagCertAns(q; K) = fhfx1 7! c1; ::; xn 7! cng; ki j (c1; ::; cn) = kg. As for count semantics, let h!; ki 2 certAns(q; K). Then the definition provided in Section 2 implies that there is no k0 6= k such that h!; k0i 2 certAns(q; K). Instead, in [ 11,14 ], h!; k0i is considered a certain answer under count semantics for each 1 k0 k.

DL-LitecHo69re . However, our focus is once again on logics that allow for number

restrictions on the RHS of axioms (and thus can encode statistics). So a natural question is whether this result still holds for DL-LitecHo69re extended with – – such axioms. Let DL-LitecHoN–re 69 denote this DL (equivalently, DL-LitecHoNre 69 is the fragment of DL-LitecHoNre that disallows concepts of the form 1 R on the

LHS of axioms). To answer this question, the bag semantics proposed in [14]

needs to be extended to concepts of the form n R. One way to do this is to define ( n R)I (d1) = Pd22 I RI (d1; d2) =n, for any bag interpretation I and d1 2 I , where “ =” denotes integer division. With this definition, the pr–oof provided in [14, Proposition 85] can be immediately extended to DL-LitecHoNre 69: – Proposition 2. For any DL-LitecHoNre 69 KB K and CQ q

certAns(q; K) = bagCertAns(q; K)

Proof (sketch). The proof of Proposition 85 in [14] uses the fact that for

any DL-LitecHo69re KB K and model I of K under count semantics, there is a bag interpretation Ib that is a model of K under bag semantics, and verifies ans(q; I) = bagAns(q; Ib) for any CQ q. This bag interpretation is defined for M 2 NC [ NR by M Ib (t) = 1 if t 2 M I , and M Ib (t) = 0 otherwise. Now for any axiom of the form A v n R, it can be verified that if AI ( n R)I , then AIb (d) ( n R)Ib (d) holds, for any individual d. So if K is now a DL-LitecHoNre–69 KB and I a model of K, then Ib as defined above is still a model of K under bag semantics.

The proof of Proposition 85 in [ 14 ] also uses the fact that for any DL-LitecHo69re bag KB K and model I of K under bag semantics, there is an interpretation Is that is a model of K under count semantics, and such that for any CQ q, mapping ! over dist(q) and k 2 N+ [ f+1g, if h!; ki 2 bagAns(q; I), then h!; k0i 2 ans(q; Is) for some k0 k. This interpretation is defined for M 2 NC [ NR by t 2 M Is iff M I (t) 1. Again, it can be verified that Is is still a model of K if – K is a DL-LitecHoNre 69 KB.

The rest follows from the original proof.

tu –

Finally, in [ 4,5 ], for DL-LitecNore and under count semantics, a query rewriting algorithm was provided for rooted CQs into a variant of RA extended with aggregation and arithmetic functions. As for the rewriting of [ 14 ], the output query does not depend on the ABox. However, such algorithm is mostly of theoretical interest, and not well suited for implementation (see Section 4). Two negative results were also provided in [ 4,5 ], which further circumscribe the scope of rewritability. Specifically, for DL-LitecHore , Count was shown to be P-hard both for rooted CQs and for CQs whose body contains a single atom. This implies that for this DL and these classes of queries, query answering under count semantics is not rewritable into a target query language for which query answering is easier than P.

Rewriting

Query answering under count semantics was shown in [ 4,5 ] to be rewritable for – rooted CQs and DL-LitecNore , into a target query language that extends RA with aggregation and some algebraic functions, thanks to a query rewriting algorithm inspired by PerfectRef [ 6 ]. However, the size of the rewritten query may be exponential in the size of the input TBox. More specifically, it may be exponential both in the number of axioms (precisely, in the depth of the deepest concept hierarchy that can be inferred from the TBox), and in the encoding of numbers that appear in number restrictions (if encoded in binary, thus polynomial in the value of such numbers). In many applications, it is reasonable to assume that the number of axioms in the TBox remains limited, so the first source of exponential blow-up may not be a major practical limitation. On the other hand, as illustrated in Section 1, values that appear in number restrictions (such as the one in 20000 enrolledIn ) may depend on the size of the domain under consideration, and thus, in some applications, they are likely to be of the same order of magnitude as the size of the ABox itself. This might be unmanageable in practice, in scenarios where we have to deal with large amounts of data.

In the following, we describe how the rewriting algorithm of [4,5] can be

adapted (for the same DL, source, and target query languages), so that the size of the output query remains linear in the size of the (binary) encoding of numbers in number restrictions. Moreover, this new rewriting guarantees that the number of algebraic operators in the output query only depends on the number of axioms of the TBox. The algorithm of [ 4,5 ] exploits the so-called chase model Ican of K – K. This model is such that, for DL-LitecNore and rooted CQs, certAns(q; K) and ans(q; IcKan ) coincide.

Specifically, we illustrate how the rewriting algorithm from [4,5] can be mod

ified by running it on the same example as the one used in [ 4,5 ]. Whenever relevant, we will explicitly point out the differences between the two algorithms. The rewriting from [ 4,5 ] generates a set of queries that can be directly translated into a SQL query. The target language for this rewriting makes use of nested aggregation in the form of special “atoms” of the form 9=k:P (x; y), with k 2 N, which intuitively correspond to a SQL COUNT DISTINCT operation, together with a boolean condition stating that the result of the aggregation operation over y must be equal to k. If the TBox contains an axiom the form C v n R, then for each k 2 [1::n], the rewriting of [ 4,5 ] generates one sub-query. Hence, the number of generated sub-queries is linear in n, and exponential in the binary encoding of n, which makes it unpractical.

In our variant, we drop the boolean condition and use instead the notation z = count(y):P (x; y), where z is now a variable storing the result of the same aggregation operation. Like in [ 4,5 ], we use negation, multiplication, and subtraction. In addition, our target language also requires the SQL function greatest(x,y) (that returns the largest value between x and y), and the aggregation operator SUM. We show through our running example that, despite our modifications, the target language for the rewriting can still be translated into SQL. P2

P2 P2

P1 a A

P1 b

P2 P2

P2 d e hQ(x; cnt = count(y1; y2)); fq(x; y1; y2) A(x); P1(x; y1); P2(y1; y2)gi

Intuitively, such query corresponds to the following SQL query4

SELECT x, COUNT ( DISTINCT y1 , y2) as cnt FROM A, P1 , P2 WHERE A.x = P1.x A AND P1.y1 = P2.y1 GROUP BY x

Then each rewriting step selects a query Q in Q, and extends Q with a set of new queries, obtained by applying some rewriting rule to Q, until saturation is reached. In the previous query, since variable y2 is unbound, we can apply a rewriting step over atom P2(y1; y2) with respect to axiom 9P1 v 3 P2, producing the query: hQ(x; cnt = sum(num)); hQ(x; y1; num = greatest(0; 3 z));

fq(x; y1) A(x); P1(x; y1); P1(_; y1)g ^ z = cnt(y2): P2(y1; y2)ii

This query corresponds to the following SQL query:

SELECT x, SUM(num) as cnt FROM ( SELECT x, y1 , greatest (0, 3-z) as num

FROM ( SELECT x, y1

FROM A, P1 , ( SELECT x as _, y1 FROM P1) as P1a) WHERE A.x = P1.x AND P1.y1 = P1a.y1 ) AS J1 , ( SELECT y1 , COUNT ( DISTINCT y2) as z FROM P2 GROUP BY y1 ) AS J2 WHERE J1.y1 = J2.y1 4 Note that the COUNT DISTINCT operator of SQL does not allow for multiple arguments.

However, the desired result can be achieved through an injective concatenation operator, for instance making use of an additional fresh symbol.

The interested reader might observe that such query fully captures the three queries (one for each non-zero value of num) from [ 4,5 ]. On such a query, one can apply a variant of the “reduce” rule of PerfectRef [ 6 ], leading to the query: hQ(x; cnt = sum(num)); hQ(x; y1; num = greatest(0; 3 z)); q(x; y1) A(x); P1(x; y1); P1(_; y1) q(x; y1) A(x); P1(x; y1)

This query corresponds to the following SQL query:

SELECT x, SUM(num) as cnt FROM ( SELECT x, y1 , greatest (0,3-z) as num FROM (( SELECT x, y1

FROM A, P1 , ( SELECT x as _, y1 FROM P1) AS P1a WHERE A.x = P1.x AND P1.y1 = P1a.y1 ) UNION ( SELECT x, y1 FROM A, P1 WHERE A.x = P1.x) ) AS J1 , ( SELECT y1 , COUNT ( DISTINCT y2) as z FROM P2 GROUP BY y1 ) AS J2 WHERE J1.y1 = J2.y1 ^ z = cnt(y2): P2(y1; y2)ii

Again, in the rewriting from [4,5], this very step produces three different queries: one for each non-zero value of num. By applying another rewriting step over atom P1(x; y1) and with respect to

axiom A v 2 P1, we obtain the following query: hQ(x; cnt = sum(num)); hQ(x; num = (greatest(0; (2 z) 3);

fq(x) A(x)g ^ z = cnt(y1): P1(x; y1)ii

This query corresponds to the following SQL query:

SELECT x, SUM(num) as cnt FROM ( SELECT x, greatest (0,2-z) * 3 as num FROM ( SELECT x FROM A) AS J1 , ( SELECT x, COUNT ( DISTINCT y1) AS z FROM P1 GROUP BY x ) AS J2 WHERE J1.x = J2.x

Let us now analyze the queries produced by the rewriting. The query after

the initialization step returns the number of paths (x; y1; y2) in A conforming to the structure dictated by the body of the input query. Since there are two such paths, this query returns the answer fx 7! a; cnt 7! 2g. The query produced after the first rewriting step checks for all sub-paths (x; y1) of (x; y1; y2) such that x is an instance of A, y1 is a P1-successor of x, and y1 has less P2-successors in the ABox than what the TBox prescribes. There is one such path in IcKan , namely the one terminating in node b that has only two P2-successors in A. This path is captured by our query, which returns as answer fx 7! a; cnt 7! 1g: indeed, there is a single way of extending this path into the anonymous part. The last query has to be interpreted in a similar way. The query retrieves the individual a, since this node has only one P1-successor in A but should have at least two P1-successors according to T . The answer to such query is fx 7! a; cnt 7! 3g. Indeed, there are three ways of extending the match fx 7! ag into the anonymous part. Now, a last aggregation (with SUM) over all queries in Q yields the desired answer fx 7! a; cnt 7! 6g, which indeed corresponds to the answer hfx 7! ag; 6i to our input query over IcKan . 5

Conclusions

In this paper, we have improved the query rewriting technique proposed in [ 4,5 ] – for rooted CQs under count semantics over a DL-LitecNore KB, into a target query language that extends RA with aggregation and arithmetic functions. We have illustrated how the exponential blow-up of the output query in the numbers that occur in the TBox, characteristic of the rewriting of [ 4,5 ], can be avoided by extending the target rewriting language, specifically with the aggregate operator sum and with the binary function greatest. We also show that this enriched language can still be translated into SQL. Considering that numeric values that appear in the TBox may in practice be of the same order of magnitude as the size of the ABox, this new rewriting algorithm is a significant step towards a practical implementation of query answering under this semantics.

Acknowledgements This research has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallen

berg Foundation, by the Italian Basic Research (PRIN) project HOPE, by the

EU H2020 project INODE, grant agreement 863410, by the CHIST-ERA project PACMEL, by the project IDEE (FESR1133) funded through the European Re

gional Development Fund (ERDF) Investment for Growth and Jobs Programme 2014-2020, by the project “South Tyrol Longitudinal study on Longevity and

Ageing” (STyLoLa), funded through the 2017 Interdisciplinary Call issued by

the Research Committee of the Free University of Bozen-Bolzano, and by the project “Ontology-based Geodata Integration, Visualization and Analysis” (OntoGeo), funded through the 2018 CRC Call issued by the Research Committee of the Free University of Bozen-Bolzano.

Artale ,

Calvanese ,

Kontchakov , and

Zakharyaschev . The DL-Lite family and relations . J. of Artificial Intelligence Research , 36 : 1 - 69 , 2009 .

Baader ,

Calvanese ,

McGuinness ,

Nardi , and

P. F.

Patel- Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications . Cambridge University Press, 2003 .

Bienvenu and

Ortiz . Ontology-mediated query answering with datatractable description logics . In Reasoning Web: Web Logic Rules - 11th Int. Summer School Tutorial Lectures (RW) , volume 9203 of LNCS , pages 218 - 307 . Springer, 2015 .

Calvanese ,

Corman ,

Lanti , and

Razniewski . Counting query answers over a DL-Lite knowledge base . In Proc. of the 29th Int. Joint Conf. on Artificial Intelligence (IJCAI) . IJCAI Org. , 2020 .

Calvanese ,

Corman ,

Lanti , and

Razniewski . Counting query answers over a DL-Lite knowledge base (Extended version) . CoRR Tech. Rep . arXiv: 2005 . 05886 , arXiv.org e-Print archive, 2020 . Available at http://arxiv.org/ abs/ 2005 .05886.

Calvanese , G. De Giacomo,

Lembo ,

Lenzerini , and

Rosati . Tractable reasoning and efficient query answering in description logics: The DL-Lite family . J. of Automated Reasoning , 39 ( 3 ): 385 - 429 , 2007 .

Calvanese , G. De Giacomo, and

Lenzerini . Conjunctive query containment and answering under description logics constraints . ACM Trans. on Comp. Logic , 9 ( 3 ): 22 . 1 - 22 .31, 2008 .

Calvanese ,

Kharlamov ,

Nutt , and

Thorne . Aggregate queries over ontologies . In Proc. of the 2nd Int. Workshop on Ontologies and Inf. Systems for the Semantic Web (ONISW) , pages 97 - 104 , 2008 .

Chen and

Mengel . Counting answers to existential positive queries: a complexity classification . In Proc. of the 35th ACM Symp. on Principles of Database Systems (PODS) , pages 315 - 326 , 2016 .

10.

Grumbach and

Milo . Towards tractable algebras for bags . J. of Computer and System Sciences , 52 ( 3 ): 570 - 588 , 1996 .

11.

E. V.

Kostylev and

J. L.

Reutter . Complexity of answering counting aggregate queries over DL-Lite . J. of Web Semantics , 33 : 94 - 111 , 2015 .

12.

Libkin and

Wong . Query languages for bags and aggregate functions . J. of Computer and System Sciences , 55 ( 2 ): 241 - 272 , 1997 .

13.

Motik ,

B. Cuenca

Grau ,

Horrocks ,

Wu ,

Fokoue , and

Lutz . OWL 2 Web Ontology Language profiles (2nd ed.). W3C Rec ., W3C , 2012 . http://www. w3.org/TR/owl2-profiles/.

14. C. Nikolaou , E. V.

Kostylev , G. Konstantinidis, M.

Kaminski , B. Cuenca

Grau , and I. Horrocks.

Foundations of ontology-based data access under bag semantics . Artificial Intelligence , 274 : 91 - 132 , 2019 .

15.

Pichler and

Skritek . Tractable counting of the answers to conjunctive queries . J. of Computer and System Sciences , 79 ( 6 ): 984 - 1001 , 2013 .

16. G. Xiao,

Calvanese ,

Kontchakov ,

Lembo ,

Poggi ,

Rosati , and

Zakharyaschev . Ontology-based data access: A survey . In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI) , pages 5511 - 5519 . IJCAI Org., 2018 .