Practical higher-order query answering over Hi (DL-LiteR ) knowledge bases Maurizio Lenzerini, Lorenzo Lepore, Antonella Poggi Dipartimento di Ingegneria Informatica, Automatica e Gestionale “Antonio Ruberti”- Sapienza Università di Roma Via Ariosto 25, I-00183 Roma, Italy lastname@dis.uniroma1.it Abstract. The language Hi(DL-LiteR ) is obtained from DL-LiteR by adding meta-modeling features, and is equipped with a query language that is able to express higher-order queries. We investigate the problem of answering a par- ticular class of such queries, called instance higher-order queries posed over Hi(DL-LiteR ) knowledge bases (KBs). The only existing algorithm for this prob- lem is based on the idea of reducing the evaluation of a higher-order query Q over a Hi(DL-LiteR ) KB to the evaluation of a union of first-order queries over a DL-LiteR KB, built from Q by instantiating all metavariables in all possible ways. Even though of polynomial time complexity with respect to the size of the KB, this algorithm turns out to be inefficient in practice. In this paper we present a new algorithm, called Smart Binding Planner (SBP), that compiles Q into a pro- gram, that issues a sequence of first-order conjunctive queries, where each query has the goal of providing the bindings for meta-variables of the next ones, and the last one completes the process by computing the answers to Q. We also illustrate some experiments showing that, in practice, SBP is significantly more efficient than the previous approach. 1 Introduction Description Logics (DLs) are popular formalisms for expressing ontologies, where an ontology is regarded as a formal specification of the concepts that are relevant in the domain of interest, together with the relationships between concepts. In many applica- tions, the need arises of modeling and reasoning about metaconcepts and metaproper- ties. Roughly speaking, a metaconcept is a concept whose instances can be themselves concepts, and a metaproperty is a relationship between metaconcepts. Metaconcept rep- resentation is needed, for example, in formal ontology, where specific metaproperties (e.g., rigidity) are used to express relevant aspects of the intended meaning of the el- ements in an ontology. More generally, the idea of representing concepts and prop- erties at the metalevel has been exploited in several sub-fields of Knowledge Repre- sentation and Computer Science, including semantic networks, early Frame-based and Description-based systems [10, 1], and conceptual modeling languages [12]. The notion of metaclass is also present in virtually all object-oriented languages, including modern programming languages (see, for instance, the Java class “class”). To see an example of metamodeling, consider the domain of computers, where we want to describe the properties of single computer machines (concept Computer in Fig- ure 1), like the serial number and the manufacture date, the properties of the computer models (Model), like the name and the country where computer machines of that mod- els are produced, and the category (Category) of the various models. Thus, for instance, one could assert that a specific computer machine (e.g. ko94-pol) is an instance of Alienware (subconcept of Computer), where Alienware is an instance of Laptop model (subconcept of Model), and Laptop model is an instance of Portable category (subcon- cept of Category). Inst C (ko94-po1, Alienware) Inst C (jh56i-09, iMac) Inst C (Alienware, Laptop Model) Inst C (iMac, Desktop Model) Inst C (Laptop Model, Portable Category) Inst C (Desktop Model, Non Portable Category) Inst R (ko94-po1, Alienware, has model) Inst R (jh56i-09, iMac, has model) Isa C (Alienware, Computer) Isa C (iMac, Computer) Isa C (Computer,Exists(manufacture date)) Isa C (Computer,Exists(serial number)) Isa C (Exists(serial number),Computer) Isa C (Exists(Inv (has model)), Model) Isa C (Computer,Exists(has model)) Isa C (Exists(has model),Computer) Isa C (Laptop Model, Model) Isa C (Desktop Model, Model) Isa C (Model,Exists(has category)) Isa C (Exists(has category),Model) Isa C (Exists(Inv (has category)), Category) Isa C (Model,Exists(name)) Isa C (Exists(Inv (produced in)), Country) Isa C (Model,Exists(produced in)) Isa C (Portable Category, Category) Isa C (Non Portable Category, Category) Disj C (Alienware, iMac) Disj C (Laptop Model, Desktop Model) Disj C (Portable Category, Non Portable Category) (a) List of assertions (b) Diagrammatic representation Fig. 1: Hi (DL-LiteR ) KB of computer domain Obviously, the benefit of using metamodeling is greatly limited if one cannot ex- press metaqueries on the knowledge base. A metaquery is an expression that combines predicates and metapredicates in order to extract complex patterns from the knowledge base. In particular, in a metaquery, variables can appear in predicate positions. In the example above, one might be interested in asking for all computer machines that are instances of a model that is an instance of Portable category, thus traversing three levels of the instance-of relation. It is well-known that in logic, higher-order constructs are needed for a correct rep- resentation of concepts and properties at the meta-level. However, current research on Description Logics only rarely addresses the issue of extending the language with higher-order constructs (see, for instance, [2, 5, 11, 13, 9]). The starting point of our investigation is the work presented in [6, 7], where the authors propose a method to add medamodeling facilities to Description Logics, and study algorithms for answering metaqueries in a particular logic, called Hi (DL-LiteR ). The logic Hi (DL-LiteR ) is ob- tained from DL-LiteR [4] by adding suitable constructs for metaconcepts and metaprop- erties modeling. Since the logics of the DL-Lite family are equipped with query an- swering algorithms with nice computational properties with respect to the size of the data, we believe that Hi (DL-LiteR ) is a good candidate for a language that can be used in ontology-based data access applications requiring to model metaconcepts and metaproperties. It was shown in [6] that answering higher-order unions of conjunctive queries (HUCQs) over Hi (DL-LiteR ) KBs is in general intractable, while answering instance- based HUCQ (IHUCQ) has the same complexity as answering standard (i.e., fist-order) unions of conjunctive queries over DL-LiteR KBs. In particular, in [6] the authors present a query answering technique based on the idea of first transforming an IHUCQ Q over a Hi (DL-LiteR ) KBs H into a IHUCQ Q0 and then computing the certain answers to Q0 over H seen as a standard DL-LiteR KB. The goal of the technique was essentially to show that the data complexity class of query answering does not change if we add metamodeling to DL-LiteR . However, the technique presented in the paper turns out to be impractical in real world cases. Indeed, in most cases, the query answering algorithm computes the union of a huge number of metaground IHCQs, most of which can be in fact empty. This is due to the fact that the metagrounding is computed “blindly”, i.e., by essentially instantiating in all possible ways all variables with the elements that are in predicate positions. In this paper, we present a new algorithm for answering IHUCQs posed over Hi (DL-LiteR ) knowledge bases, called Smart Binding Planner (SBP). The basic idea of the algorithm is to compile the query q into a program expressed in a Datalog-like language, that issues a sequence of first-order conjunctive queries, where each query has the goal of providing the bindings for meta-predicates of the next ones, and the last one completes the process by computing the answers to q. We also illustrate some pre- liminary experiments showing that, in practice, SBP is significantly more efficient than the previous approach. Note that our query language corresponds to a subset of SPARQL 1.1 interpreted under the OWL 2 direct semantics entailment regime [8], in particular the subset con- stituted by unions of conjunctive queries using only atoms on the instance-of relations. Thus, our algorithm can be seen as a first effective approach to answering such queries, but in the context where the KB has richer meta-modeling features than just punning. The paper is organized as follows. In Section 2 we describe the Hi (DL-LiteR ) logic. In section 3 we discuss the problem of expressing and evaluating IHUCQs in Hi (DL-LiteR ), and we briefly illustrate the algorithm presented in [6]. In Section 4 we present our new algorithm, and Section 5 describes a set of experiments aiming at illus- trating the behavior of our algorithm and at comparing it with the previous technique. Section 6 concludes the paper. 2 Hi (DL-LiteR ) In the following we recall the basics of Hi (DL-LiteR ) [6], by first presenting its syntax and then its semantics. Following [6], we can characterize a traditional DL L by a set OP (L) of operators, used to form concept and role expressions, and a set of MP (L) of metapredicates, used to form assertions. Each operator and each metapredicate have an associated arity. If symbol S has arity n, then we write S/n to denote such a sym- bol and its arity. For DL-LiteR , we have: (i) OP (DL-LiteR ) = {Inv /1, Exists/1}, which stand for “inverse role”, and “existential restriction”, (ii) MP (DL-LiteR ) = {Inst C /2, Inst R /3, Isa C /2, Isa R /2, Disj C /2, Disj R /2}, which stand for instance, isa and disjoint assertions on both concepts and roles. Syntax. Given a countably infinite alphabet S of element names, we inductively define the set of expressions for Hi (DL-LiteR ), denoted by EDL-LiteR (S), over the alphabet S as follows: – if e ∈ S, then e ∈ EDL-LiteR (S); – if e ∈ EDL-LiteR (S), and e is not of the form Inv (e0 ) (where e0 is any expression), then Inv (e) ∈ EDL-LiteR (S); – if e ∈ EDL-LiteR (S), then Exists(e) ∈ EDL-LiteR (S). Intuitively, expressions denote elements, i.e., individuals, concepts and roles, of the knowledge base. The names in S are the symbols denoting the atomic elements, while the expressions denote either atomic elements, inverses of atomic elements (when in- terpreted as roles), or projections of atomic elements on either the first or the second component (again, when interpreted such elements as roles). The basic building blocks of a Hi (DL-LiteR ) knowledge base are assertions. A DL-LiteR -assertion, or simply assertion, over the alphabet S for Hi (DL-LiteR ) is a statement of one of the forms a1 (e1 , e2 ), a2 (e1 , e2 , e3 ) where a1 ∈ MP (DL-LiteR ) is either Inst C , Isa C , Isa R , Disj C , Disj R , a2 is Inst R , and e1 , e2 , e3 are expressions in EDL-LiteR (S). Thus, an assertion is simply an application of a metapredicate to a set of expressions, which intuitively means that an assertion is an axiom that predicates over a set of individuals, concepts or roles. A Hi(DL-LiteR ) knowledge base (KB) over S is a finite set of Hi (DL-LiteR )-assertions over S. For the sake of brevity, we do not include here the semantics of Hi (DL-LiteR ) and we refer the reader to [7]. We refer to [7] also for the notions of expressions occurring as respectively object,concept and role argument in one assertion. 3 Querying Hi (DL-LiteR ) knowledge bases In this section we address the issue of querying Hi (DL-LiteR ) knowledge bases. Specif- ically, we define queries over Hi (DL-LiteR ) knowledge bases and report on the only query answering technique over such knowledge bases we are aware of. In the next section, we will then propose our new technique for query answering. In order to define Hi (DL-LiteR ) queries, we first need to introduce the no- tion of “atom”. We consider a countably infinite alphabet of variables V, dis- joint from S, and a countably infinite alphabet of query predicates, each with its arity, disjoint from all other alphabets. An atom over S and V (or simply, an atom) has the form a1 (e1 , e2 ), a2 (e1 , e2 , e3 ) where a1 ∈ MP (DL-LiteR ) is either Inst C , Isa C , Isa R , Disj C or Disj R , a2 is Inst R , and each ei is either an expression in EDL-LiteR (S) or a variable in V, i.e., ei ∈ EDL-LiteR (S) ∪ V. In the above assertions, a1 and a2 are called the predicates of the atom. A higher-order conjunctive query (HCQ) of arity n is a formula of the form q(u1 , . . . , un ) ← a1 , . . . , am where n ≥ 0, m ≥ 1, q is a query predicate of arity n (which is also the arity of the query), each ai is an atom, and each ui is either in V and occurs in some aj , or it is in EDL-LiteR (S). As usual, the tuple u = (u1 , . . . , un ) is called the target of the query q and the variables in u are its distinguished vari- ables. The other variables of the query are called existential variables. Also, similarly to names within assertions, we say that a variable occurs as concept argument or role argument within an atom, depending on its position. Variables occurring as concept or role arguments are called metavariables. A higher-order union of conjunctive queries (HUCQ) of arity n is a set of HCQs of arity n with the same query predicate. Also a HUCQ is metaground if it does not contain any metavariable. A HCQ is called instance higher-order conjunctive query (IHCQ) if each of its atoms has Inst C and Inst R as predicate. A HUCQ containing only IHCQs is called instance higher-order union of conjunctive queries (IHUCQ). Finally, a higher-order query is called Boolean if it has arity 0. Example 1. Consider the Hi (DL-LiteR ) KB H in Figure 1. Suppose one needs to re- trieve all items, together with their model and the model category, but only if the model category is classified as ’Portable Computer’. The IHCQ expressing this need is the following: q(x, y, z) ← Inst C (x, y), Inst C (y, z), Inst C (z,’Portable Category’) In order to define the semantics of queries, we now introduce the notion of assign- ment. Indeed, to interpret non-ground terms, we need assignments over interpretations, where an assignment µ over hΣ, Io i is a function µ : V → ∆. Given an interpreta- tion I = hΣ, Io i and an assignment µ over I, the interpretation of terms is specified by the function (·)Io ,µ : EDL-LiteR (S) ∪ V → ∆ defined as follows: (i) if e ∈ S then eIo ,µ = eIo ; (ii) if e ∈ V then eIo ,µ = µ(e); (iii) op(e)Io ,µ = opIo (eIo ,µ ). Finally, we define the semantics of atoms, by defining the notion of satisfaction of an atom with respect to an interpretation I and an assignment µ over I as follows: – I, µ |= Inst C (e1 , e2 ) if eI1 o ,µ ∈ (eI2 o ,µ )Ic ; – I, µ|=Inst R (e1 , e2 , e3 ) if heI1 o ,µ , eI2 o ,µ i∈(eI3 o ,µ )Ir ; – I, µ |= Isa C (e1 , e2 ) if (eI1 o ,µ )Ic ⊆ (eI2 o ,µ )Ic ; – I, µ |= Isa R (e1 , e2 ) if (eI1 o ,µ )Ir ⊆ (eI2 o ,µ )Ir ; – I, µ |= Disj C (e1 , e2 ) if (eI1 o ,µ )Ic ∩ (eI2 o ,µ )Ic = ∅; – I, µ |= Disj R (e1 , e2 ) if (eI1 o ,µ )Ir ∩ (eI2 o ,µ )Ir = ∅. Let I be an interpretation and µ an assignment over I. A Boolean HCQ q of the form q ← a1 , . . . , an is satisfied in I, µ if every query atom ai is satisfied in I, µ. Given a Boolean HCQ q and a Hi (DL-LiteR ) KB H, we say that q is logically implied by H (denoted by H |= q) if for each model I of H there exists an assignment µ such that q is satisfied by I, µ. Given a non-Boolean HCQ q of the form q(e1 , . . . , en ) ← a1 , . . . , am , a grounding substitution of q is a substitution θ such that e1 θ, . . . , en θ are ground terms. We call e1 θ, . . . , en θ a grounding tuple. The set of certain answers to q in H, denoted by cert(q, H), is the set of grounding tuples e1 θ, . . . , en θ that make the Boolean query qθ ← a1 θ, . . . , an θ logically implied by H. These notions extend immediately to HUCQs. As we said in the introduction, in [6] the authors present a query answering tech- nique based on the idea of first transforming a IHUCQ Q over a Hi (DL-LiteR ) KBs H into a metaground IHUCQ Q0 and then computing the certain answers to Q0 over H seen as a standard DL-LiteR KB.1 It was shown that query answering with such a tech- nique is in AC0 w.r.t. the number of instance assertions, in PT IME w.r.t. KB complexity, and NP-complete w.r.t. combined complexity. While describing, in the following, the technique of [6] in more details, we introduce notions that we use in the next sections. Given a Hi (DL-LiteR ) KB H, we respectively denote by Roles(H) and Concepts(H) the sets Roles(H) = {e, Inv (e)|e ∈ EDL-LiteR (S) and e occurs as role 1 In fact, in [6], the authors consider KBs equipped with mappings, and therefore their technique aims at rewriting the initial query over the actual data sources. Here, we ignore this aspect and adapt their algorithm to our scenario. argument in H} and Concepts(H) = {e | e ∈ EDL-LiteR (S) and e occurs as concept argument in H} ∪ {Exists(e), Exists(Inv (e)) | e ∈ EDL-LiteR (S) and e occurs as role argument in H}. Given two IHCQs q, q 0 and a KB H, we say that q 0 is a partial meta- grounding of q with respect to H if q 0 = σ(q) where σ is a partial substitution of the metavariables of q such that for each metavariable x of q, either σ(x) = x, or: (i) if x occurs in concept position in q, then σ(x) ∈ Concepts(H); (ii) if x occurs in role position in q, then σ(x) ∈ Roles(H). Given a IHCQ q and a KB H, we denote by P M G(q, H) the IHUCQ constituted by the union of all the partial metagroundings of q w.r.t. H. Moreover Sgiven a IHUCQ Q and a KB H, we denote by P M G(Q, H) the IHUCQ formed by q∈Q P M G(q, H). With these notions in place, the algorithm of [6], which we denote as C OM - PUTE PMG, can be described as follows. Given a IHUCQ Q and a Hi (DL-LiteR ) KB H, it first computes the query P M G(Q, H) and then returns the certain answers to the query P M G(Q, H) with respect to H. It is worth noting that, although polynomial w.r.t. the number of instance assertions in H, C OMPUTE PMG can be very inefficient in practice, in the sense that it computes the union of a huge number of IHCQs, most of which can be in fact empty. This is due to the fact that the partial metagrounding is computed “blindly”, i.e., by instantiating in all possible ways, all metavariables with elements of Concepts(H) and Roles(H). 4 New algorithm for answering IHUCQs in Hi (DL-LiteR ) In this section we present the Smart Binding Planner (SBP) algorithm, aiming at re- ducing the number of metaground queries to be evaluated. The algorithm is based on a process that computes a sequence of metaground IHCQs, where each query has the goal of providing both the answers to the query, and the bindings for metavariables of the next ones. Before delving into the details of the SBP algorithm, we shortly sketch its three main steps. First, it splits the query into a sequence of subqueries (function S PLITA N - D O RDER ) such that the evaluation of the i-th subquery provides the bindings to instan- tiate the metavariables of the (i + 1)-th subquery. Second, based on such subqueries ordering, it builds a program (function DHQPS YNTHESIS), expressed in a specific lan- guage named Datalog-based Higher Order Query Plan (DHQP). Then, as third and final step, it evaluates the DHQP program (function E VALUATE DHQP). In the following, we start by presenting the DHQP language and illustrating the function E VALUATE DHQP. We then define the two other functions that are used by SBP, namely the functions S PLITA ND O RDER and DHQPS YNTHESIS, and finally we present the complete algorithm SBP, and discuss its properties. 4.1 The DHQP language Let AΓ and AI be two pairwise disjoint alphabets, disjoint from S and V, called, re- spectively, the alphabet of bridge predicates and the alphabet of intensional predicates, each one with an associated arity. Intensional and bridge predicates are used to form a DHQP program, which, intuitively, is constitued by an ordered set of Datalog-like rules whose head predicate is an intensional predicate in AI , and whose body contains an atom with a bridge predicate in AΓ . Every bridge predicate γ of arity n has an asso- ciated IHCQ of arity n, denoted def (γ). Moreover, some of the n arguments of each bridge predicate γ are classified as “to-be-bound”. When we write a bridge predicate, we indicate the variables corresponding to such arguments by underlining them. Intu- itively, the underlined variables of the bridge predicate γ are those variables that are to be bound to ground expressions whenever we want to evaluate the query def (γ). Also, the bridge predicate is used to denote the relation where we store the certain answers of a set of metaground IHCQs, each one obtained by def (γ) by substituting the un- derlined variables of γ with a ground expression. Variables of the query def (γ) that appear underlined in the corresponding bridge predicate γ are called input variables of the query. Note that, when we write the query predicate of def (γ), we underline the query input variables. We now provide the definition of the syntax of a DHQP program. Let m be a non- negative integer, let Γ be a sequence (γ0 , . . . , γm ) of m + 1 bridge predicates in AΓ , and let I be a sequence (I0 , . . . , Im ) of m + 1 intensional predicates in AI . A DHQP program P over Γ and I is a sequence of m + 1 DHQP rules r0 , r1 , . . . , rm , such that – r0 , called the base rule of P , has the form I0 (v) ← γ0 (w), where every variable in v occurs in w. – for each 1 ≤ j ≤ m, rule rj has the form Ij (v) ← Ij−1 (u), γj (w), where every variable in v occurs in u or w, and every variable in u which is not an underlined variable in w appears in v. For every 0 ≤ j ≤ m, we say that rj defines Ij using γj . In the case of 1 ≤ j ≤ m, we say that rj defines Ij using γj based on Ij−1 . Example 2. Let I = (I0 /1, I1 /2, I2 /3), and Γ = (γ0 /1, γ1 /2, γ2 /2), with qγ0 , qγ1 , and qγ2 defined as follows: – qγ0 (z) = Inst C (z, Portable Category), – qγ1 (z, y) = Inst C (y, z), – qγ2 (y, y) = Inst C (x, y). The following is a DHQP program over Γ and I, called P : r0 : I0 (z) ← γ0 (z) r1 : I1 (y, z) ← I0 (z), γ1 (z, y) r2 : I2 (x, y, z) ← I1 (y, z), γ2 (y, x) We now turn our attention to the semantics of a DHQP program. Toward this goal, we introduce the notion of instantiation of an IHCQ. Given an IHCQ q, an n-tuple x of variables of q, and an n-tuple t of expressions in EDL-LiteR (S), the x-instantiation of q with t, denoted IN ST (q, x ← t), is the IHCQ obtained from q by substituting each xi in x with the expression ti in t, for i ∈ {1, . . . , n}. Note that a partial metagrounding of q with respect to H is an x-instantiation of q with any n-tuple t = (t1 , t2 , . . . , tn ), where x = (x1 , x2 , . . . , xn ) contains only metavariables in q, and such that, for every i ∈ {1, . . . , n}, ti ∈ Concept(H) if xi occurs as concept argument within q, and ti ∈ Role(H) if xi occurs as role argument. Let P be a DHQP program constituted by the rules (r0 , . . . , rm ). We specify the semantics of P operationally, i.e., we define an algorithm, called E VALUATE DHQP, that, given as input a DHQP program P , and a Hi (DL-LiteR ) KB H, computes the result of evaluating P with respect to a H as follows: – Consider rule r0 , and compute the extension of I0 as the certain answers to the query P M G(qγ0 , H) over the KB H. – For each 1 ≤ i ≤ m, consider rule ri , and compute the extension of Ii by eval- uating the Datalog rule Ii (v) ← Ii−1 (u), γi (w) , where the extension of Ii−1 is the result computed when considering rule ri−1 , and the extension of γi is con- 0 S of the query P M G(q 0, H) over the KB0 H, with stituted by the certain answers 0 q constructed as follows: t∈π 0 (Ii−1 ) IN ST (qγi , w ← t), where w denotes w the input variables of qγi occurring both in u and in w, and πw0 (R) denotes the projection of relation R onto the arguments corresponding to w0 . – Return the extension of the predicate Im as the result of evaluating P over H. Example 3. Consider the DHQP program P described in Example 2, and the KB H in Figure 1. The algorithm E VALUATE DHQP(P, H) proceeds as follows. First, it consid- ers rule r0 and computes the certain answers of the metaground query qγ0 with respect to H, which, in this case, produces the following result: {(Laptop Model)}. By means of the rule r0 , this result becomes also the extension of I0 . Then, it considers rule r1 , and, for each tuple of I0 (in this case, only one) evaluates the query resulting from the z- instantiation of qγ1 with the tuple itself; this means that the certain answers to the meta- ground query q(y) ← Inst C (y, Laptop Model) (i.e., {(Alienware)}) are computed and stored in the relation γ1 (z, y). Such certain answers are used in rule r1 to compute the extension of I1 , which becomes {(Laptop Model, Alienware)}. Finally, the algorithm considers r2 , and computes the final result {(ko94-po1, Alienware, Laptop Model)}. 4.2 The function S PLITA ND O RDER Given an IHCQ q with distinguished variables x, S PLITA ND O RDER returns a sequence of subqueries of q by proceeding as follows. 1. It builds the dependency graph of q. The dependency graph of q is a directed graph whose nodes are the atoms of q, and the edges are formed according to the fol- lowing rule: there is an edge from an atom a1 to an atom a2 if it at least one of the following is verified: (i) a2 depends on a1 , i.e., if the expression occurring as predicate argument in a2 is a variable occurring as object argument in a1 ; (ii) there exists a variable x of Q that is not a metavariable and x occurs as object argument in both a1 and a2 (note that in this case, there will be an edge also from a1 to a2 ). Intuitively, a2 depends on a1 , if by evaluating the query with body a1 , one obtains bindings for the metavariable of a2 , which allow to instantiate and then evaluate the subquery with body a2 . 2. It assigns to each atom a a depth d(a) by using the following breadth-first strategy: (a) k ← 0; (b) do the following until the graph is empty: 2.1 assign k to each atom n occurring in a strongly connected component C such that every atom in C has all of its incoming edges coming from C; 2.2 remove from the graph all the considered atoms and their edges; 2.3 k ← k + 1. 3. For every atom a, it marks every variable that is a metavariable and occurs in an atom a0 such that d(a0 ) < d(a). Intuitively, if a variable within an atom is marked, then bindings for such a variable will be provided in order to evaluate the query with body that atom by using a standard DL-LiteR reasoner. 4. For every i ∈ {0, m}, where m is the maximal atom depth, it defines an IHCQ qγi whose arity is the number of variables occurring in some atom at depth i, and whose form is qγi (uγi , ui ) ← bγi , where uγi contains all variables that are marked in some atom at depth i, ui contains all unmarked variables occurring in some atom at depth i that also belong to x or, for i < m are marked in some atom at depth i + 1, and bγi is the conjunction of all atoms at depth i. Intuitively, atoms having the same depth will be evaluated in the same query. The queries qγ0 , . . . , qγm built by S PLITA ND O RDER(q) will be used by the function DHQPS YNTHESIS to construct the DHQP program whose evaluation will return the answers to q. Example 4. Consider the IHCQ presented in Example 1. The result of the S PLITA N - D O RDER function is the sequence B = (qγ0 , qγ1 , qγ2 ) presented in Example 2. 4.3 The function DHQPS YNTHESIS Given a IHCQ q with target tuple x, and the sequence (qγ0 , qγ1 , . . . , qγm ) obtained by S PLITA ND O RDER (q), DHQPS YNTHESIS builds a DHQP program constituted by m + 1 as follows: – The base rule has the form I0 (v) ← γ0 (w), where def (γ0 ) = qγ0 , each variable of w that is an input variable of qγ0 is underlined, and v is such that: if m = 0, then v coincides with x, otherwise v is constituted by all the variables in w that are in x, and all the variables in w in position of input arguments in qγ1 . – For j ∈ {1, . . . , m}, the j-th rule defining Ij based on Ij−1 has the form Ij (v) ← Ij−1 (u), γj (w) where def (γj ) = qγj , each variable of w that is an input variable of qγj is underlined, and v is such that if j = m, then v coincides with x, otherwise v is constituted by all the variables in w and u that are in x, and all the variables in w and u in position of input arguments in qγj+1 . Example 5. Consider the IHCQ Q presented in Example 1 and the sequence of queries B obtained by executing S PLITA ND O RDER(Q), as shown in Example 4. It is easy to see that, by executing DHQPS YNTHESIS(x,B), where x = (x, y, z) is the target tuple of Q, we obtain the DHQP program presented in Example 2. 4.4 The SBP Algorithm We are now ready to describe the algorithm SBP. As shown in the following, the algo- rithm iterates over the various IHCQs in IHUCQ input query Q. At each iteration, the IHCQ q under exam is first split into subqueries, that are ordered on the basis of bind- ings that the evaluation of a subquery provides for variables of subsequent subqueries. On the basis of the ordering of subqueries, q is then compiled into a DHQP program, and then such a program is evaluated over H. The result of such evaluation constitutes the resulting set of answers R. Algorithm SBP(Q,H) input IHUCQ Q of arity n, Hi(DL-LiteR ) KB H output Set of n-tuples of expressons in EDL-LiteR (S) begin R=∅ for each q ∈ Q B = S PLITA ND O RDER (q); P = DHQPS YNTHESIS (q,B); R = R ∪ E VALUATE DHQP(P , H) return R end The following theorem shows termination and correctness of the algorithm. Theorem 1. Given a Hi (DL-LiteR ) KB H and an IHUCQ q over H, the SBP(q, H) terminates and computes the certain answers of Q over H. It is also easy to characterize the complexity of SBP. The following theorem pro- vides such a charaterization. Theorem 2. Answering IHUCQs over Hi (DL-LiteR ) KBs with SBP is PT IME w.r.t. both instance and KB complexity, and NP-complete w.r.t. combined complexity. 5 Experiments In this section we show the results of a preliminary set of tests that were carried out to compare the performances of SBP with that of the algorithm C OMPUTE PMG presented at the end of Section 3. Such set of tests compares the evaluation time of the IHCQ q of Example 1 over four different extensions to the KB H in Figure 1. The extensions we have taken into account are the following. – Extension A. The ontology is obtained by adding to the ontology H 30000 assertions of the form Inst C (ki ,Alienware) and 30000 assertions of the form Inst C (ki ,iMac) where ki 6= kj for 1 ≤ i, j ≤ 60000 and i 6= j, and each ki is a new name not occurring in any of the assertions of ontology H; – Extension B. The ontology is obtained by adding to A the following set of asser- tions {Isa C (Tablet Model, Model), Isa C (iPad, Computer), Inst C (Tablet Model, Portable Category), Inst C (iPad, Tablet Model)}; – Extension C. The ontology is obtained by adding to B 30000 assertions of the form Inst C (ki ,iPad) where ki 6= kj for 1 ≤ i, j ≤ 30000 and i 6= j, and each ki is a new name not occurring in any of the assertions of ontology B; – Extension D. The ontology is obtained by adding to A the following set of as- sertions {Isa C (Server Model, Model), Isa C (RackR230, Computer), Inst C (Server Model, Non Portable Category), Inst C (RackR230, Server Model)}. The results of the tests are summarized in Table 1. Each row in the table shows the behaviors of the algorithms SBP and C OMPUTE PMG when computing the cer- tain answers to q over a given ontology. Columns in the table show: (i) the considered ontology, (ii) the number of concept expressions in its intensional assertions, (iii) the behaviour of SBP in terms of number of metaground IHCQs executed, and execution time, (iv) the beaviour of C OMPUTE PMG). SBP C OMPUTE PMG # metaground # metaground H |Concepts(H)| time2 time2 IHCQs IHCQs A 19 3 0, 16 sec. 361 14, 63sec. B 21 5 0, 25 sec. 441 19, 27 sec. C 21 5 0, 28 sec. 441 20, 64 sec. D 21 3 0, 16 sec. 441 19, 03 sec. Table 1: Comparison between SBP and C OMPUTE PMG First we notice that, when executed on the same ontology, SBP is up to 110 times faster then C OMPUTE PMG. The reason of that improvement is due to the smart ground- ing instantiation of the metavariables of q performed by SBP. For instance, to evaluate q over the ontology A, C OMPUTE PMG issues 361 metaground IHCQs obtained by considering all the substitutions in Concepts(A) for the two metavariables occurring in the atoms of q, thus leading to |Concepts(A)|2 = 192 = 361 metaground IHCQs that have to be evaluated whereas SBP induces the evaluation of only 3 metaground IHCQs. Second, we observe that the increase of the size of the ontology has different impacts on the two algorithms performances. Specifically, while the performance of C OMPUTE PMG worsen as soon as assertions with new expressions are added to the intensional part of the ontology, this does not necessarily happen for SBP. 6 Conclusion We have presented a new technique to answer IHCUQs over Hi (DL-LiteR ) KBs, which improves the one presented in [6] by avoiding the blind grounding of metavariables that may cause query answering to become impractical. We have illustrated a first set of experiments showing the advantage of using our strategy. In addition to the set of simple tests described above, at the time of this writing we are running a set of tests involving real world ontologies. These new experiments confirm that answering IHUCQs with the C OMPUTE PMG strategy can be very ineffi- cient, and that SBP is a promising direction to pursue. However, we are aware that there are other possible optimization that we can study to improve the work presented here. One notable example is the management of cy- cles in the dependency graph. Presently, we assign the same depth to all of the atoms occurring in a cycle but there are cases where breaking the cycle, thus assigning differ- ent depth to the involved atoms, can allow to further reduce the number of metaground queries that have to be considered during the evaluation of the correspondent program. Our future works will be focused on identifying those cases, and studying other possible optimization strategies. Acknowledgements: Work partially supported by the EU under FP7, project Optique (Scalable End-user Access to Big Data), grant n. FP7-318338. 2 The evaluation reported here refers to experiments carried out using M ASTRO [3] as reasoner computing the certain answers to metaground queries posed to DL-LiteR KBs References 1. G. Attardi and M. Simi. Consistency and completeness of OMEGA, a logic for knowledge representation. In Proc. of IJCAI’81, pages 504–510, 1981. 2. L. Badea. Reifying concepts in description logics. In Proc. of IJCAI’97, pages 142–147, 1997. 3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez-Muro, R. Rosati, M. Ruzzi, and D. F. Savo. The Mastro system for ontology-based data access. Semantic Web J., 2(1):43–53, 2011. 4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 5. S. Colucci, T. Di Noia, E. Di Sciascio, F. M. Donini, and A. Ragone. Second-order de- scription logics: Semantics, motivation, and a calculus. In Proc. of DL 2010, volume 573 of CEUR, ceur-ws.org, 2010. 6. G. De Giacomo, M. Lenzerini, and R. Rosati. Higher-order description logics for domain metamodeling. In Proc. of AAAI 2011, 2011. 7. F. Di Pinto, G. De Giacomo, M. Lenzerini, and R. Rosati. Ontology-based data access with dynamic TBoxes in DL-Lite. In Proc. of AAAI 2012, 2012. 8. B. Glimm and C. Ogbuji. Sparql 1.1 entailment regimes. W3C reccomendation, W3C, 2013. Available at http://www.w3.org/TR/sparql11-entailment/. 9. B. Glimm, S. Rudolph, and J. Volker. Integrated metamodeling and diagnosis in owl 2. In Proc. of ISWC 2010, pages 257–272, 2010. 10. F. Lehmann, editor. Semantic Networks in Artificial Intelligence. Pergamon Press, Oxford (United Kingdom), 1992. 11. B. Motik. On the properties of metamodeling in OWL. J. of Logic and Computation, 17(4):617–637, 2007. 12. J. Mylopoulos, P. A. Bernstein, and H. K. T. Wong. A language facility for designing database-intensive applications. ACM Trans. on Database Systems, 5(2):185–207, 1980. 13. J. Z. Pan and I. Horrocks. OWL FA: a metamodeling extension of OWL DL. In Proc. of WWW 2006, pages 1065–1066, 2006.