Practical higher-order query answering over
               Hi (DL-LiteR ) knowledge bases

                 Maurizio Lenzerini, Lorenzo Lepore, Antonella Poggi

                   Dipartimento di Ingegneria Informatica, Automatica e
                 Gestionale “Antonio Ruberti”- Sapienza Università di Roma
                             Via Ariosto 25, I-00183 Roma, Italy
                             lastname@dis.uniroma1.it


       Abstract. The language Hi(DL-LiteR ) is obtained from DL-LiteR by adding
       meta-modeling features, and is equipped with a query language that is able to
       express higher-order queries. We investigate the problem of answering a par-
       ticular class of such queries, called instance higher-order queries posed over
       Hi(DL-LiteR ) knowledge bases (KBs). The only existing algorithm for this prob-
       lem is based on the idea of reducing the evaluation of a higher-order query Q
       over a Hi(DL-LiteR ) KB to the evaluation of a union of first-order queries over
       a DL-LiteR KB, built from Q by instantiating all metavariables in all possible
       ways. Even though of polynomial time complexity with respect to the size of the
       KB, this algorithm turns out to be inefficient in practice. In this paper we present a
       new algorithm, called Smart Binding Planner (SBP), that compiles Q into a pro-
       gram, that issues a sequence of first-order conjunctive queries, where each query
       has the goal of providing the bindings for meta-variables of the next ones, and the
       last one completes the process by computing the answers to Q. We also illustrate
       some experiments showing that, in practice, SBP is significantly more efficient
       than the previous approach.


1   Introduction
Description Logics (DLs) are popular formalisms for expressing ontologies, where an
ontology is regarded as a formal specification of the concepts that are relevant in the
domain of interest, together with the relationships between concepts. In many applica-
tions, the need arises of modeling and reasoning about metaconcepts and metaproper-
ties. Roughly speaking, a metaconcept is a concept whose instances can be themselves
concepts, and a metaproperty is a relationship between metaconcepts. Metaconcept rep-
resentation is needed, for example, in formal ontology, where specific metaproperties
(e.g., rigidity) are used to express relevant aspects of the intended meaning of the el-
ements in an ontology. More generally, the idea of representing concepts and prop-
erties at the metalevel has been exploited in several sub-fields of Knowledge Repre-
sentation and Computer Science, including semantic networks, early Frame-based and
Description-based systems [10, 1], and conceptual modeling languages [12]. The notion
of metaclass is also present in virtually all object-oriented languages, including modern
programming languages (see, for instance, the Java class “class”).
    To see an example of metamodeling, consider the domain of computers, where we
want to describe the properties of single computer machines (concept Computer in Fig-
ure 1), like the serial number and the manufacture date, the properties of the computer
models (Model), like the name and the country where computer machines of that mod-
els are produced, and the category (Category) of the various models. Thus, for instance,
one could assert that a specific computer machine (e.g. ko94-pol) is an instance of
Alienware (subconcept of Computer), where Alienware is an instance of Laptop model
(subconcept of Model), and Laptop model is an instance of Portable category (subcon-
cept of Category).

Inst C (ko94-po1, Alienware)                      Inst C (jh56i-09, iMac)
Inst C (Alienware, Laptop Model)                  Inst C (iMac, Desktop Model)
Inst C (Laptop Model, Portable Category)          Inst C (Desktop Model, Non Portable Category)
Inst R (ko94-po1, Alienware, has model)           Inst R (jh56i-09, iMac, has model)
Isa C (Alienware, Computer)                       Isa C (iMac, Computer)
Isa C (Computer,Exists(manufacture date))         Isa C (Computer,Exists(serial number))
Isa C (Exists(serial number),Computer)            Isa C (Exists(Inv (has model)), Model)
Isa C (Computer,Exists(has model))                Isa C (Exists(has model),Computer)
Isa C (Laptop Model, Model)                       Isa C (Desktop Model, Model)
Isa C (Model,Exists(has category))                Isa C (Exists(has category),Model)
Isa C (Exists(Inv (has category)), Category)      Isa C (Model,Exists(name))
Isa C (Exists(Inv (produced in)), Country)        Isa C (Model,Exists(produced in))
Isa C (Portable Category, Category)               Isa C (Non Portable Category, Category)
Disj C (Alienware, iMac)                          Disj C (Laptop Model, Desktop Model)
Disj C (Portable Category, Non Portable Category)

                                  (a) List of assertions                                          (b) Diagrammatic representation

                                     Fig. 1: Hi (DL-LiteR ) KB of computer domain


    Obviously, the benefit of using metamodeling is greatly limited if one cannot ex-
press metaqueries on the knowledge base. A metaquery is an expression that combines
predicates and metapredicates in order to extract complex patterns from the knowledge
base. In particular, in a metaquery, variables can appear in predicate positions. In the
example above, one might be interested in asking for all computer machines that are
instances of a model that is an instance of Portable category, thus traversing three levels
of the instance-of relation.
    It is well-known that in logic, higher-order constructs are needed for a correct rep-
resentation of concepts and properties at the meta-level. However, current research
on Description Logics only rarely addresses the issue of extending the language with
higher-order constructs (see, for instance, [2, 5, 11, 13, 9]). The starting point of our
investigation is the work presented in [6, 7], where the authors propose a method to
add medamodeling facilities to Description Logics, and study algorithms for answering
metaqueries in a particular logic, called Hi (DL-LiteR ). The logic Hi (DL-LiteR ) is ob-
tained from DL-LiteR [4] by adding suitable constructs for metaconcepts and metaprop-
erties modeling. Since the logics of the DL-Lite family are equipped with query an-
swering algorithms with nice computational properties with respect to the size of the
data, we believe that Hi (DL-LiteR ) is a good candidate for a language that can be
used in ontology-based data access applications requiring to model metaconcepts and
metaproperties.
    It was shown in [6] that answering higher-order unions of conjunctive queries
(HUCQs) over Hi (DL-LiteR ) KBs is in general intractable, while answering instance-
based HUCQ (IHUCQ) has the same complexity as answering standard (i.e., fist-order)
unions of conjunctive queries over DL-LiteR KBs.
    In particular, in [6] the authors present a query answering technique based on the
idea of first transforming an IHUCQ Q over a Hi (DL-LiteR ) KBs H into a IHUCQ
Q0 and then computing the certain answers to Q0 over H seen as a standard DL-LiteR
KB. The goal of the technique was essentially to show that the data complexity class of
query answering does not change if we add metamodeling to DL-LiteR . However, the
technique presented in the paper turns out to be impractical in real world cases. Indeed,
in most cases, the query answering algorithm computes the union of a huge number of
metaground IHCQs, most of which can be in fact empty. This is due to the fact that the
metagrounding is computed “blindly”, i.e., by essentially instantiating in all possible
ways all variables with the elements that are in predicate positions.
     In this paper, we present a new algorithm for answering IHUCQs posed over
Hi (DL-LiteR ) knowledge bases, called Smart Binding Planner (SBP). The basic idea
of the algorithm is to compile the query q into a program expressed in a Datalog-like
language, that issues a sequence of first-order conjunctive queries, where each query
has the goal of providing the bindings for meta-predicates of the next ones, and the last
one completes the process by computing the answers to q. We also illustrate some pre-
liminary experiments showing that, in practice, SBP is significantly more efficient than
the previous approach.
     Note that our query language corresponds to a subset of SPARQL 1.1 interpreted
under the OWL 2 direct semantics entailment regime [8], in particular the subset con-
stituted by unions of conjunctive queries using only atoms on the instance-of relations.
Thus, our algorithm can be seen as a first effective approach to answering such queries,
but in the context where the KB has richer meta-modeling features than just punning.
     The paper is organized as follows. In Section 2 we describe the Hi (DL-LiteR )
logic. In section 3 we discuss the problem of expressing and evaluating IHUCQs in
Hi (DL-LiteR ), and we briefly illustrate the algorithm presented in [6]. In Section 4 we
present our new algorithm, and Section 5 describes a set of experiments aiming at illus-
trating the behavior of our algorithm and at comparing it with the previous technique.
Section 6 concludes the paper.


2 Hi (DL-LiteR )
In the following we recall the basics of Hi (DL-LiteR ) [6], by first presenting its syntax
and then its semantics. Following [6], we can characterize a traditional DL L by a set
OP (L) of operators, used to form concept and role expressions, and a set of MP (L)
of metapredicates, used to form assertions. Each operator and each metapredicate have
an associated arity. If symbol S has arity n, then we write S/n to denote such a sym-
bol and its arity. For DL-LiteR , we have: (i) OP (DL-LiteR ) = {Inv /1, Exists/1},
which stand for “inverse role”, and “existential restriction”, (ii) MP (DL-LiteR ) =
{Inst C /2, Inst R /3, Isa C /2, Isa R /2, Disj C /2, Disj R /2}, which stand for instance,
isa and disjoint assertions on both concepts and roles.
Syntax. Given a countably infinite alphabet S of element names, we inductively define
the set of expressions for Hi (DL-LiteR ), denoted by EDL-LiteR (S), over the alphabet S
as follows:
 – if e ∈ S, then e ∈ EDL-LiteR (S);
 – if e ∈ EDL-LiteR (S), and e is not of the form Inv (e0 ) (where e0 is any expression),
   then Inv (e) ∈ EDL-LiteR (S);
 – if e ∈ EDL-LiteR (S), then Exists(e) ∈ EDL-LiteR (S).
    Intuitively, expressions denote elements, i.e., individuals, concepts and roles, of the
knowledge base. The names in S are the symbols denoting the atomic elements, while
the expressions denote either atomic elements, inverses of atomic elements (when in-
terpreted as roles), or projections of atomic elements on either the first or the second
component (again, when interpreted such elements as roles).
    The basic building blocks of a Hi (DL-LiteR ) knowledge base are assertions. A
DL-LiteR -assertion, or simply assertion, over the alphabet S for Hi (DL-LiteR ) is a
statement of one of the forms
                                  a1 (e1 , e2 ), a2 (e1 , e2 , e3 )
where a1 ∈ MP (DL-LiteR ) is either Inst C , Isa C , Isa R , Disj C , Disj R , a2 is Inst R ,
and e1 , e2 , e3 are expressions in EDL-LiteR (S). Thus, an assertion is simply an application
of a metapredicate to a set of expressions, which intuitively means that an assertion is
an axiom that predicates over a set of individuals, concepts or roles. A Hi(DL-LiteR )
knowledge base (KB) over S is a finite set of Hi (DL-LiteR )-assertions over S.
    For the sake of brevity, we do not include here the semantics of Hi (DL-LiteR ) and
we refer the reader to [7]. We refer to [7] also for the notions of expressions occurring
as respectively object,concept and role argument in one assertion.

3    Querying Hi (DL-LiteR ) knowledge bases
In this section we address the issue of querying Hi (DL-LiteR ) knowledge bases. Specif-
ically, we define queries over Hi (DL-LiteR ) knowledge bases and report on the only
query answering technique over such knowledge bases we are aware of. In the next
section, we will then propose our new technique for query answering.
    In order to define Hi (DL-LiteR ) queries, we first need to introduce the no-
tion of “atom”. We consider a countably infinite alphabet of variables V, dis-
joint from S, and a countably infinite alphabet of query predicates, each with its
arity, disjoint from all other alphabets. An atom over S and V (or simply, an
atom) has the form a1 (e1 , e2 ), a2 (e1 , e2 , e3 ) where a1 ∈ MP (DL-LiteR ) is either
Inst C , Isa C , Isa R , Disj C or Disj R , a2 is Inst R , and each ei is either an expression in
EDL-LiteR (S) or a variable in V, i.e., ei ∈ EDL-LiteR (S) ∪ V. In the above assertions, a1
and a2 are called the predicates of the atom.
    A higher-order conjunctive query (HCQ) of arity n is a formula of the form
q(u1 , . . . , un ) ← a1 , . . . , am where n ≥ 0, m ≥ 1, q is a query predicate of arity
n (which is also the arity of the query), each ai is an atom, and each ui is either in V
and occurs in some aj , or it is in EDL-LiteR (S). As usual, the tuple u = (u1 , . . . , un )
is called the target of the query q and the variables in u are its distinguished vari-
ables. The other variables of the query are called existential variables. Also, similarly
to names within assertions, we say that a variable occurs as concept argument or role
argument within an atom, depending on its position. Variables occurring as concept or
role arguments are called metavariables. A higher-order union of conjunctive queries
(HUCQ) of arity n is a set of HCQs of arity n with the same query predicate. Also a
HUCQ is metaground if it does not contain any metavariable.
    A HCQ is called instance higher-order conjunctive query (IHCQ) if each of its
atoms has Inst C and Inst R as predicate. A HUCQ containing only IHCQs is called
instance higher-order union of conjunctive queries (IHUCQ).
      Finally, a higher-order query is called Boolean if it has arity 0.

Example 1. Consider the Hi (DL-LiteR ) KB H in Figure 1. Suppose one needs to re-
trieve all items, together with their model and the model category, but only if the model
category is classified as ’Portable Computer’. The IHCQ expressing this need is the
following:
            q(x, y, z) ← Inst C (x, y), Inst C (y, z), Inst C (z,’Portable Category’)


     In order to define the semantics of queries, we now introduce the notion of assign-
ment. Indeed, to interpret non-ground terms, we need assignments over interpretations,
where an assignment µ over hΣ, Io i is a function µ : V → ∆. Given an interpreta-
tion I = hΣ, Io i and an assignment µ over I, the interpretation of terms is specified
by the function (·)Io ,µ : EDL-LiteR (S) ∪ V → ∆ defined as follows: (i) if e ∈ S then
eIo ,µ = eIo ; (ii) if e ∈ V then eIo ,µ = µ(e); (iii) op(e)Io ,µ = opIo (eIo ,µ ).
     Finally, we define the semantics of atoms, by defining the notion of satisfaction of
an atom with respect to an interpretation I and an assignment µ over I as follows:
  – I, µ |= Inst C (e1 , e2 ) if eI1 o ,µ ∈ (eI2 o ,µ )Ic ;
  – I, µ|=Inst R (e1 , e2 , e3 ) if heI1 o ,µ , eI2 o ,µ i∈(eI3 o ,µ )Ir ;
  – I, µ |= Isa C (e1 , e2 ) if (eI1 o ,µ )Ic ⊆ (eI2 o ,µ )Ic ;
  – I, µ |= Isa R (e1 , e2 ) if (eI1 o ,µ )Ir ⊆ (eI2 o ,µ )Ir ;
  – I, µ |= Disj C (e1 , e2 ) if (eI1 o ,µ )Ic ∩ (eI2 o ,µ )Ic = ∅;
  – I, µ |= Disj R (e1 , e2 ) if (eI1 o ,µ )Ir ∩ (eI2 o ,µ )Ir = ∅.
     Let I be an interpretation and µ an assignment over I. A Boolean HCQ q of the
form q ← a1 , . . . , an is satisfied in I, µ if every query atom ai is satisfied in I, µ.
Given a Boolean HCQ q and a Hi (DL-LiteR ) KB H, we say that q is logically implied
by H (denoted by H |= q) if for each model I of H there exists an assignment µ such
that q is satisfied by I, µ. Given a non-Boolean HCQ q of the form q(e1 , . . . , en ) ←
a1 , . . . , am , a grounding substitution of q is a substitution θ such that e1 θ, . . . , en θ are
ground terms. We call e1 θ, . . . , en θ a grounding tuple. The set of certain answers to
q in H, denoted by cert(q, H), is the set of grounding tuples e1 θ, . . . , en θ that make
the Boolean query qθ ← a1 θ, . . . , an θ logically implied by H. These notions extend
immediately to HUCQs.
     As we said in the introduction, in [6] the authors present a query answering tech-
nique based on the idea of first transforming a IHUCQ Q over a Hi (DL-LiteR ) KBs
H into a metaground IHUCQ Q0 and then computing the certain answers to Q0 over H
seen as a standard DL-LiteR KB.1 It was shown that query answering with such a tech-
nique is in AC0 w.r.t. the number of instance assertions, in PT IME w.r.t. KB complexity,
and NP-complete w.r.t. combined complexity. While describing, in the following, the
technique of [6] in more details, we introduce notions that we use in the next sections.
     Given a Hi (DL-LiteR ) KB H, we respectively denote by Roles(H) and
Concepts(H) the sets Roles(H) = {e, Inv (e)|e ∈ EDL-LiteR (S) and e occurs as role
 1
     In fact, in [6], the authors consider KBs equipped with mappings, and therefore their technique
     aims at rewriting the initial query over the actual data sources. Here, we ignore this aspect and
     adapt their algorithm to our scenario.
argument in H} and Concepts(H) = {e | e ∈ EDL-LiteR (S) and e occurs as concept
argument in H} ∪ {Exists(e), Exists(Inv (e)) | e ∈ EDL-LiteR (S) and e occurs as role
argument in H}. Given two IHCQs q, q 0 and a KB H, we say that q 0 is a partial meta-
grounding of q with respect to H if q 0 = σ(q) where σ is a partial substitution of the
metavariables of q such that for each metavariable x of q, either σ(x) = x, or: (i) if
x occurs in concept position in q, then σ(x) ∈ Concepts(H); (ii) if x occurs in role
position in q, then σ(x) ∈ Roles(H). Given a IHCQ q and a KB H, we denote by
P M G(q, H) the IHUCQ constituted by the union of all the partial metagroundings of
q w.r.t. H. Moreover Sgiven a IHUCQ Q and a KB H, we denote by P M G(Q, H) the
IHUCQ formed by q∈Q P M G(q, H).
    With these notions in place, the algorithm of [6], which we denote as C OM -
PUTE PMG, can be described as follows. Given a IHUCQ Q and a Hi (DL-LiteR ) KB
H, it first computes the query P M G(Q, H) and then returns the certain answers to
the query P M G(Q, H) with respect to H. It is worth noting that, although polynomial
w.r.t. the number of instance assertions in H, C OMPUTE PMG can be very inefficient
in practice, in the sense that it computes the union of a huge number of IHCQs, most
of which can be in fact empty. This is due to the fact that the partial metagrounding
is computed “blindly”, i.e., by instantiating in all possible ways, all metavariables with
elements of Concepts(H) and Roles(H).


4     New algorithm for answering IHUCQs in Hi (DL-LiteR )

In this section we present the Smart Binding Planner (SBP) algorithm, aiming at re-
ducing the number of metaground queries to be evaluated. The algorithm is based on
a process that computes a sequence of metaground IHCQs, where each query has the
goal of providing both the answers to the query, and the bindings for metavariables of
the next ones.
     Before delving into the details of the SBP algorithm, we shortly sketch its three
main steps. First, it splits the query into a sequence of subqueries (function S PLITA N -
D O RDER ) such that the evaluation of the i-th subquery provides the bindings to instan-
tiate the metavariables of the (i + 1)-th subquery. Second, based on such subqueries
ordering, it builds a program (function DHQPS YNTHESIS), expressed in a specific lan-
guage named Datalog-based Higher Order Query Plan (DHQP). Then, as third and
final step, it evaluates the DHQP program (function E VALUATE DHQP).
     In the following, we start by presenting the DHQP language and illustrating the
function E VALUATE DHQP. We then define the two other functions that are used by
SBP, namely the functions S PLITA ND O RDER and DHQPS YNTHESIS, and finally we
present the complete algorithm SBP, and discuss its properties.


4.1   The DHQP language

Let AΓ and AI be two pairwise disjoint alphabets, disjoint from S and V, called, re-
spectively, the alphabet of bridge predicates and the alphabet of intensional predicates,
each one with an associated arity. Intensional and bridge predicates are used to form a
DHQP program, which, intuitively, is constitued by an ordered set of Datalog-like rules
whose head predicate is an intensional predicate in AI , and whose body contains an
atom with a bridge predicate in AΓ . Every bridge predicate γ of arity n has an asso-
ciated IHCQ of arity n, denoted def (γ). Moreover, some of the n arguments of each
bridge predicate γ are classified as “to-be-bound”. When we write a bridge predicate,
we indicate the variables corresponding to such arguments by underlining them. Intu-
itively, the underlined variables of the bridge predicate γ are those variables that are to
be bound to ground expressions whenever we want to evaluate the query def (γ). Also,
the bridge predicate is used to denote the relation where we store the certain answers
of a set of metaground IHCQs, each one obtained by def (γ) by substituting the un-
derlined variables of γ with a ground expression. Variables of the query def (γ) that
appear underlined in the corresponding bridge predicate γ are called input variables of
the query. Note that, when we write the query predicate of def (γ), we underline the
query input variables.
     We now provide the definition of the syntax of a DHQP program. Let m be a non-
negative integer, let Γ be a sequence (γ0 , . . . , γm ) of m + 1 bridge predicates in AΓ ,
and let I be a sequence (I0 , . . . , Im ) of m + 1 intensional predicates in AI .
     A DHQP program P over Γ and I is a sequence of m + 1 DHQP rules
r0 , r1 , . . . , rm , such that
   – r0 , called the base rule of P , has the form I0 (v) ← γ0 (w), where every variable
      in v occurs in w.
   – for each 1 ≤ j ≤ m, rule rj has the form Ij (v) ← Ij−1 (u), γj (w), where every
      variable in v occurs in u or w, and every variable in u which is not an underlined
      variable in w appears in v.
     For every 0 ≤ j ≤ m, we say that rj defines Ij using γj . In the case of 1 ≤ j ≤ m,
we say that rj defines Ij using γj based on Ij−1 .

Example 2. Let I = (I0 /1, I1 /2, I2 /3), and Γ = (γ0 /1, γ1 /2, γ2 /2), with qγ0 , qγ1 ,
and qγ2 defined as follows:
  – qγ0 (z) = Inst C (z, Portable Category),
  – qγ1 (z, y) = Inst C (y, z),
  – qγ2 (y, y) = Inst C (x, y).
The following is a DHQP program over Γ and I, called P :

                 r0 : I0 (z) ← γ0 (z)
                 r1 : I1 (y, z) ← I0 (z), γ1 (z, y)
                 r2 : I2 (x, y, z) ← I1 (y, z), γ2 (y, x)

    We now turn our attention to the semantics of a DHQP program. Toward this goal,
we introduce the notion of instantiation of an IHCQ. Given an IHCQ q, an n-tuple x of
variables of q, and an n-tuple t of expressions in EDL-LiteR (S), the x-instantiation of q
with t, denoted IN ST (q, x ← t), is the IHCQ obtained from q by substituting each xi
in x with the expression ti in t, for i ∈ {1, . . . , n}. Note that a partial metagrounding
of q with respect to H is an x-instantiation of q with any n-tuple t = (t1 , t2 , . . . , tn ),
where x = (x1 , x2 , . . . , xn ) contains only metavariables in q, and such that, for every
i ∈ {1, . . . , n}, ti ∈ Concept(H) if xi occurs as concept argument within q, and
ti ∈ Role(H) if xi occurs as role argument.
    Let P be a DHQP program constituted by the rules (r0 , . . . , rm ). We specify the
semantics of P operationally, i.e., we define an algorithm, called E VALUATE DHQP,
that, given as input a DHQP program P , and a Hi (DL-LiteR ) KB H, computes the
result of evaluating P with respect to a H as follows:
  – Consider rule r0 , and compute the extension of I0 as the certain answers to the
     query P M G(qγ0 , H) over the KB H.
  – For each 1 ≤ i ≤ m, consider rule ri , and compute the extension of Ii by eval-
     uating the Datalog rule Ii (v) ← Ii−1 (u), γi (w) , where the extension of Ii−1
     is the result computed when considering rule ri−1 , and the extension of γi is con-
                                                            0
                                 S of the query P M G(q 0, H) over the KB0 H, with
     stituted by the certain answers
       0
     q constructed as follows: t∈π 0 (Ii−1 ) IN ST (qγi , w ← t), where w denotes
                                      w
     the input variables of qγi occurring both in u and in w, and πw0 (R) denotes the
     projection of relation R onto the arguments corresponding to w0 .
  – Return the extension of the predicate Im as the result of evaluating P over H.
Example 3. Consider the DHQP program P described in Example 2, and the KB H in
Figure 1. The algorithm E VALUATE DHQP(P, H) proceeds as follows. First, it consid-
ers rule r0 and computes the certain answers of the metaground query qγ0 with respect
to H, which, in this case, produces the following result: {(Laptop Model)}. By means
of the rule r0 , this result becomes also the extension of I0 . Then, it considers rule r1 ,
and, for each tuple of I0 (in this case, only one) evaluates the query resulting from the z-
instantiation of qγ1 with the tuple itself; this means that the certain answers to the meta-
ground query q(y) ← Inst C (y, Laptop Model) (i.e., {(Alienware)}) are computed and
stored in the relation γ1 (z, y). Such certain answers are used in rule r1 to compute the
extension of I1 , which becomes {(Laptop Model, Alienware)}. Finally, the algorithm
considers r2 , and computes the final result {(ko94-po1, Alienware, Laptop Model)}.


4.2   The function S PLITA ND O RDER
Given an IHCQ q with distinguished variables x, S PLITA ND O RDER returns a sequence
of subqueries of q by proceeding as follows.
 1. It builds the dependency graph of q. The dependency graph of q is a directed graph
    whose nodes are the atoms of q, and the edges are formed according to the fol-
    lowing rule: there is an edge from an atom a1 to an atom a2 if it at least one of
    the following is verified: (i) a2 depends on a1 , i.e., if the expression occurring as
    predicate argument in a2 is a variable occurring as object argument in a1 ; (ii) there
    exists a variable x of Q that is not a metavariable and x occurs as object argument
    in both a1 and a2 (note that in this case, there will be an edge also from a1 to a2 ).
    Intuitively, a2 depends on a1 , if by evaluating the query with body a1 , one obtains
    bindings for the metavariable of a2 , which allow to instantiate and then evaluate
    the subquery with body a2 .
 2. It assigns to each atom a a depth d(a) by using the following breadth-first strategy:
     (a) k ← 0;
     (b) do the following until the graph is empty:
         2.1 assign k to each atom n occurring in a strongly connected component C
              such that every atom in C has all of its incoming edges coming from C;
         2.2 remove from the graph all the considered atoms and their edges;
         2.3 k ← k + 1.
 3. For every atom a, it marks every variable that is a metavariable and occurs in an
    atom a0 such that d(a0 ) < d(a). Intuitively, if a variable within an atom is marked,
    then bindings for such a variable will be provided in order to evaluate the query
    with body that atom by using a standard DL-LiteR reasoner.
 4. For every i ∈ {0, m}, where m is the maximal atom depth, it defines an IHCQ
    qγi whose arity is the number of variables occurring in some atom at depth i, and
    whose form is qγi (uγi , ui ) ← bγi , where uγi contains all variables that are marked
    in some atom at depth i, ui contains all unmarked variables occurring in some atom
    at depth i that also belong to x or, for i < m are marked in some atom at depth
    i + 1, and bγi is the conjunction of all atoms at depth i. Intuitively, atoms having
    the same depth will be evaluated in the same query.
    The queries qγ0 , . . . , qγm built by S PLITA ND O RDER(q) will be used by the function
DHQPS YNTHESIS to construct the DHQP program whose evaluation will return the
answers to q.
Example 4. Consider the IHCQ presented in Example 1. The result of the S PLITA N -
D O RDER function is the sequence B = (qγ0 , qγ1 , qγ2 ) presented in Example 2.


4.3   The function DHQPS YNTHESIS
Given a IHCQ q with target tuple x, and the sequence (qγ0 , qγ1 , . . . , qγm ) obtained
by S PLITA ND O RDER (q), DHQPS YNTHESIS builds a DHQP program constituted by
m + 1 as follows:
 – The base rule has the form I0 (v) ← γ0 (w), where def (γ0 ) = qγ0 , each variable
    of w that is an input variable of qγ0 is underlined, and v is such that: if m = 0, then
    v coincides with x, otherwise v is constituted by all the variables in w that are in
    x, and all the variables in w in position of input arguments in qγ1 .
 – For j ∈ {1, . . . , m}, the j-th rule defining Ij based on Ij−1 has the form Ij (v) ←
    Ij−1 (u), γj (w) where def (γj ) = qγj , each variable of w that is an input variable
    of qγj is underlined, and v is such that if j = m, then v coincides with x, otherwise
    v is constituted by all the variables in w and u that are in x, and all the variables
    in w and u in position of input arguments in qγj+1 .
Example 5. Consider the IHCQ Q presented in Example 1 and the sequence of queries
B obtained by executing S PLITA ND O RDER(Q), as shown in Example 4. It is easy to
see that, by executing DHQPS YNTHESIS(x,B), where x = (x, y, z) is the target tuple
of Q, we obtain the DHQP program presented in Example 2.


4.4   The SBP Algorithm
We are now ready to describe the algorithm SBP. As shown in the following, the algo-
rithm iterates over the various IHCQs in IHUCQ input query Q. At each iteration, the
IHCQ q under exam is first split into subqueries, that are ordered on the basis of bind-
ings that the evaluation of a subquery provides for variables of subsequent subqueries.
On the basis of the ordering of subqueries, q is then compiled into a DHQP program,
and then such a program is evaluated over H. The result of such evaluation constitutes
the resulting set of answers R.
Algorithm SBP(Q,H)
input IHUCQ Q of arity n, Hi(DL-LiteR ) KB H
output Set of n-tuples of expressons in EDL-LiteR (S)
begin
  R=∅
  for each q ∈ Q
    B = S PLITA ND O RDER (q);
    P = DHQPS YNTHESIS (q,B);
    R = R ∪ E VALUATE DHQP(P , H)
  return R
end

The following theorem shows termination and correctness of the algorithm.

Theorem 1. Given a Hi (DL-LiteR ) KB H and an IHUCQ q over H, the SBP(q, H)
terminates and computes the certain answers of Q over H.

    It is also easy to characterize the complexity of SBP. The following theorem pro-
vides such a charaterization.

Theorem 2. Answering IHUCQs over Hi (DL-LiteR ) KBs with SBP is PT IME w.r.t.
both instance and KB complexity, and NP-complete w.r.t. combined complexity.


5    Experiments
In this section we show the results of a preliminary set of tests that were carried out to
compare the performances of SBP with that of the algorithm C OMPUTE PMG presented
at the end of Section 3. Such set of tests compares the evaluation time of the IHCQ q of
Example 1 over four different extensions to the KB H in Figure 1. The extensions we
have taken into account are the following.
  – Extension A. The ontology is obtained by adding to the ontology H 30000
     assertions of the form Inst C (ki ,Alienware) and 30000 assertions of the form
     Inst C (ki ,iMac) where ki 6= kj for 1 ≤ i, j ≤ 60000 and i 6= j, and each ki is
     a new name not occurring in any of the assertions of ontology H;
  – Extension B. The ontology is obtained by adding to A the following set of asser-
     tions {Isa C (Tablet Model, Model), Isa C (iPad, Computer), Inst C (Tablet Model,
     Portable Category), Inst C (iPad, Tablet Model)};
  – Extension C. The ontology is obtained by adding to B 30000 assertions of the
     form Inst C (ki ,iPad) where ki 6= kj for 1 ≤ i, j ≤ 30000 and i 6= j, and each ki is
     a new name not occurring in any of the assertions of ontology B;
  – Extension D. The ontology is obtained by adding to A the following set of as-
     sertions {Isa C (Server Model, Model), Isa C (RackR230, Computer), Inst C (Server
     Model, Non Portable Category), Inst C (RackR230, Server Model)}.
     The results of the tests are summarized in Table 1. Each row in the table shows
the behaviors of the algorithms SBP and C OMPUTE PMG when computing the cer-
tain answers to q over a given ontology. Columns in the table show: (i) the considered
ontology, (ii) the number of concept expressions in its intensional assertions, (iii) the
behaviour of SBP in terms of number of metaground IHCQs executed, and execution
time, (iv) the beaviour of C OMPUTE PMG).
                                                 SBP                  C OMPUTE PMG
                                       # metaground               # metaground
                  H    |Concepts(H)|                   time2                     time2
                                          IHCQs                      IHCQs
                  A         19               3       0, 16 sec.       361      14, 63sec.
                  B         21               5       0, 25 sec.       441      19, 27 sec.
                  C         21               5       0, 28 sec.       441      20, 64 sec.
                  D         21               3       0, 16 sec.       441      19, 03 sec.
                  Table 1: Comparison between SBP and C OMPUTE PMG


    First we notice that, when executed on the same ontology, SBP is up to 110 times
faster then C OMPUTE PMG. The reason of that improvement is due to the smart ground-
ing instantiation of the metavariables of q performed by SBP. For instance, to evaluate
q over the ontology A, C OMPUTE PMG issues 361 metaground IHCQs obtained by
considering all the substitutions in Concepts(A) for the two metavariables occurring
in the atoms of q, thus leading to |Concepts(A)|2 = 192 = 361 metaground IHCQs
that have to be evaluated whereas SBP induces the evaluation of only 3 metaground
IHCQs. Second, we observe that the increase of the size of the ontology has different
impacts on the two algorithms performances. Specifically, while the performance of
C OMPUTE PMG worsen as soon as assertions with new expressions are added to the
intensional part of the ontology, this does not necessarily happen for SBP.


6      Conclusion
We have presented a new technique to answer IHCUQs over Hi (DL-LiteR ) KBs, which
improves the one presented in [6] by avoiding the blind grounding of metavariables that
may cause query answering to become impractical.
    We have illustrated a first set of experiments showing the advantage of using our
strategy. In addition to the set of simple tests described above, at the time of this writing
we are running a set of tests involving real world ontologies. These new experiments
confirm that answering IHUCQs with the C OMPUTE PMG strategy can be very ineffi-
cient, and that SBP is a promising direction to pursue.
    However, we are aware that there are other possible optimization that we can study
to improve the work presented here. One notable example is the management of cy-
cles in the dependency graph. Presently, we assign the same depth to all of the atoms
occurring in a cycle but there are cases where breaking the cycle, thus assigning differ-
ent depth to the involved atoms, can allow to further reduce the number of metaground
queries that have to be considered during the evaluation of the correspondent program.
Our future works will be focused on identifying those cases, and studying other possible
optimization strategies.

Acknowledgements: Work partially supported by the EU under FP7, project Optique
(Scalable End-user Access to Big Data), grant n. FP7-318338.


 2
     The evaluation reported here refers to experiments carried out using M ASTRO [3] as reasoner
     computing the certain answers to metaground queries posed to DL-LiteR KBs
References
 1. G. Attardi and M. Simi. Consistency and completeness of OMEGA, a logic for knowledge
    representation. In Proc. of IJCAI’81, pages 504–510, 1981.
 2. L. Badea. Reifying concepts in description logics. In Proc. of IJCAI’97, pages 142–147,
    1997.
 3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez-Muro,
    R. Rosati, M. Ruzzi, and D. F. Savo. The Mastro system for ontology-based data access.
    Semantic Web J., 2(1):43–53, 2011.
 4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning, 39(3):385–429, 2007.
 5. S. Colucci, T. Di Noia, E. Di Sciascio, F. M. Donini, and A. Ragone. Second-order de-
    scription logics: Semantics, motivation, and a calculus. In Proc. of DL 2010, volume 573 of
    CEUR, ceur-ws.org, 2010.
 6. G. De Giacomo, M. Lenzerini, and R. Rosati. Higher-order description logics for domain
    metamodeling. In Proc. of AAAI 2011, 2011.
 7. F. Di Pinto, G. De Giacomo, M. Lenzerini, and R. Rosati. Ontology-based data access with
    dynamic TBoxes in DL-Lite. In Proc. of AAAI 2012, 2012.
 8. B. Glimm and C. Ogbuji. Sparql 1.1 entailment regimes. W3C reccomendation, W3C, 2013.
    Available at http://www.w3.org/TR/sparql11-entailment/.
 9. B. Glimm, S. Rudolph, and J. Volker. Integrated metamodeling and diagnosis in owl 2. In
    Proc. of ISWC 2010, pages 257–272, 2010.
10. F. Lehmann, editor. Semantic Networks in Artificial Intelligence. Pergamon Press, Oxford
    (United Kingdom), 1992.
11. B. Motik. On the properties of metamodeling in OWL. J. of Logic and Computation,
    17(4):617–637, 2007.
12. J. Mylopoulos, P. A. Bernstein, and H. K. T. Wong. A language facility for designing
    database-intensive applications. ACM Trans. on Database Systems, 5(2):185–207, 1980.
13. J. Z. Pan and I. Horrocks. OWL FA: a metamodeling extension of OWL DL. In Proc. of
    WWW 2006, pages 1065–1066, 2006.