=Paper= {{Paper |id=Vol-1720/full2 |storemode=property |title=Conjunctive Query Answering via a Fragment of Set Theory |pdfUrl=https://ceur-ws.org/Vol-1720/full2.pdf |volume=Vol-1720 |authors=Domenico Cantone,Marianna Nicolosi-Asmundo,Daniele Francesco Santamaria |dblpUrl=https://dblp.org/rec/conf/ictcs/CantoneAS16 }} ==Conjunctive Query Answering via a Fragment of Set Theory== https://ceur-ws.org/Vol-1720/full2.pdf
Conjunctive Query Answering via a Fragment of
                Set Theory?

              Domenico Cantone, Marianna Nicolosi-Asmundo, and
                        Daniele Francesco Santamaria

         University of Catania, Dept. of Mathematics and Computer Science
               email: {cantone,nicolosi,santamaria}@dmi.unict.it



       Abstract. We address the problem of Conjunctive Query Answering
                                                               ×
       (CQA) for the description logic DLh4LQSR,× i(D) (DL4,  D , for short) which
                                     R
       extends the logic DLh4LQS i(D) with Boolean operations on concrete
       roles and with the product of concepts.
                                                   ×                           4,×
       The result is obtained by formalizing DL4, D -knowledge bases and DLD -
       conjunctive queries in terms of formulae of the four-level set-theoretic
       fragment 4LQSR , which admits a restricted form of quantification on
       variables of the first three levels and on pair terms. We solve the CQA
                          ×
       problem for DL4, D    through a decision procedure for the satisfiability
       problem of 4LQSR . We further define a KE-tableau based procedure
       for the same problem, more suitable for implementation purposes, and
       analyze its computational complexity.


1    Introduction
In the last few years, results from Computable Set Theory have been used as a
means to represent and reason about description logics and rule languages for
the semantic web. For instance, in [4–6], fragments of set theory with constructs
related to multi-valued maps have been studied and applied to the realm of knowl-
edge representation. In [8], an expressive description logic, called DLhMLSS×     2,m i,
                                                                       ×
has been introduced and the consistency problem for DLhMLSS2,m i-knowledge
bases has been proved NP-complete. The description logic DLhMLSS×            2,m i has
been extended with additional constructs and SWRL rules in [6], proving that
the decision problem for the resulting logic, called DLh∀π0,2 i, is still NP-complete
under suitable conditions. The description logic DLh∀π0,2 i has been extended with
some metamodelling features in [4]. In [7], the description logic DLh4LQSR i(D)
(more simply referred to as DL4D ) has been introduced. DL4D can be represented
?
  This work has been partially supported by the Polish National Science Centre research
  project DEC-2011/02/A/HS1/00395.
Copyright c by the paper’s authors. Copying permitted for private and academic pur-
poses.
V. Biló, A. Caruso (Eds.): ICTCS 2016, Proceedings of the 17th Italian Conference
on Theoretical Computer Science, 73100 Lecce, Italy, September 7–9 2016, pp. 23–35
published in CEUR Workshop Proceedins Vol-1720 at http://ceur-ws.org/Vol-1720
24      D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

in the decidable four-level stratified fragment of set theory 4LQSR involving a
restricted form of quantification over variables of the first three levels and pair
terms (cf. [2]). The logic DL4D admits concept constructs such as full negation,
union and intersection of concepts, concept domain and range, existential quan-
tification and min cardinality on the left-hand side of inclusion axioms. It also
supports role constructs such as role chains on the left hand side of inclusion
axioms, union, intersection, and complement of abstract roles, and properties
on roles such as transitivity, symmetry, reflexivity, and irreflexivity. It admits
datatypes, a simple form of concrete domains that are relevant in real world
applications.

    The consistency problem for DL4D -knowledge bases has been proved decidable
in [7] by means of a reduction to the satisfiability problem for 4LQSR , proved
decidable in [2]. It has also been proved, under not very restrictive constraints,
that the consistency problem for DL4D -knowledge bases is NP-complete. The
latter result has practical outcomes since, for example, the ontology Ontoceramic
[9] can be expressed in such a restricted version of DL4D . Finally, we mention that
the papers [4–8] are concerned with traditional research issues for description
logics mainly focused on the parts of a knowledge base representing conceptual
information, namely the TBox and the RBox, where the principal reasoning
services are subsumption and satisfiability.

    In this paper we exploit decidability results presented in [2, 7] to deal with
reasoning services for knowledge bases involving ABoxes. The most basic service
to query the instance data is instance retrieval, i.e., the task of retrieving all
individuals that instantiate a class C, and, dually, all named classes C that an
individual belongs to. In particular, a powerful way to query ABoxes is the
Conjunctive Query Answering task (CQA). CQA is relevant in the context of
description logics and, in particular, for real world applications based on semantic
web technologies, since it provides a mechanism allowing users and applications
to interact with ontologies and data. The task of CQA has been studied for
several well-known description logics (cf. [1, 13, 15]).

    In particular, we introduce the description logic DLh4LQSR,× i(D) (DL4,×
                                                                           D , for
short), extending DL4D with Boolean operations on concrete roles and with the
product of concepts. Then we define the CQA problem for DL4,×    D and prove its
decidability via a reduction to the CQA problem for 4LQSR , whose decidability
follows from that of the satisfiability problem for 4LQSR (proved in [2]). Finally,
we present a KE-tableau based procedure that, given a DL4,×    D -query Q and a
    4,×
DLD -knowledge base KB represented in set-theoretic terms, determines the
answer set of Q with respect to KB, providing also some complexity results. The
choice of the KE-tableau system [10] is motivated by the fact that this variant
of the tableau method allows one to construct trees whose distinct branches
define mutually exclusive situations thus avoiding the proliferation of redundant
branches, typical of semantic tableaux.
                    Conjunctive Query Answering via a Fragment of Set Theory                   25

2      Preliminaries
2.1      The set-theoretic fragment 4LQSR
It is convenient to first introduce the syntax and semantics of a more general
four-level quantified language, denoted 4LQS. Then we provide some restrictions
on quantified formulae of 4LQS that characterize 4LQSR . We recall that the
satisfiability problem for 4LQSR has been proved decidable in [2].
    4LQS involves four collections, Vi , of variables of sort i, for i = 0, 1, 2, 3.
Variables of sort i, for i = 0, 1, 2, 3, will be denoted by X i , Y i , Z i , . . . (in particular,
variables of sort 0 will also be denoted by x, y, z, . . .). In addition to variables,
4LQS involves also pair terms of the form hx, yi, with x, y ∈ V0 .
4LQS-quantifier-free atomic formulae are classified as:
  - level 0: x = y, x ∈ X 1 , hx, yi = X 2 , hx, yi ∈ X 3 ;
  - level 1: X 1 = Y 1 , X 1 ∈ X 2 ;
  - level 2: X 2 = Y 2 , X 2 ∈ X 3 .
4LQS purely universal formulae are classified as:
  - level 1: (∀z1 ) . . . (∀zn )ϕ0 , where z1 , . . . , zn ∈ V0 and ϕ0 is any propositional
    combination of quantifier-free atomic formulae of level 0;
  - level 2: (∀Z11 ) . . . (∀Zm
                              1
                                )ϕ1 , where Z11 , . . . , Zm
                                                           1
                                                             ∈ V1 and ϕ1 is any propositional
    combination of quantifier-free atomic formulae of levels 0 and 1, and of purely
    universal formulae of level 1;
  - level 3: (∀Z12 ) . . . (∀Zp2 )ϕ2 , where Z12 , . . . , Zp2 ∈ V2 and ϕ2 is any proposi-
    tional combination of quantifier-free atomic formulae and of purely universal
    formulae of levels 1 and 2.
4LQS-formulae are all the propositional combinations of quantifier-free atomic
formulae of levels 0, 1, 2 and of purely universal formulae of levels 1, 2, 3.
    Let ϕ be a 4LQS-formula. Without loss of generality, we can assume that
ϕ contains only ¬, ∧, ∨ as propositional connectives. Further, let Sϕ be the
syntax tree for a 4LQS-formula ϕ,1 and let ν be a node of Sϕ . We say that a
4LQS-formula ψ occurs within ϕ at position ν if the subtree of Sϕ rooted at ν
is identical to Sψ . In this case we refer to ν as an occurrence of ψ in ϕ and to
the path from the root of Sϕ to ν as its occurrence path. An occurrence of ψ
within ϕ is positive if its occurrence path deprived by its last node contains an
even number of nodes labelled by a 4LQS-formula of type ¬χ. Otherwise, the
occurrence is said to be negative.
    The variables z1 , . . . , zn are said to occur quantified in (∀z1 ) . . . (∀zn )ϕ0 . Like-
wise, Z11 , . . . , Zm
                     1
                       and Z12 , . . . , Zp2 occur quantified in (∀Z11 ) . . . (∀Zm
                                                                                  1
                                                                                    )ϕ1 and in
    2              2
(∀Z1 ) . . . (∀Zp )ϕ2 , respectively. A variable occurs free in a 4LQS-formula ϕ if it
does not occur quantified in any subformula of ϕ. For i = 0, 1, 2, 3, we denote
with Vari (ϕ) the collections of variables of level i occurring free in ϕ.
 1
     The notion of syntax tree for 4LQS-formulae is similar to the notion of syntax tree
     for formulae of first-order logic. A precise definition of the latter can be found in [11].
26      D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

     A (level 0) substitution σ := {x1 /y1 , . . . , xn /yn } is the mapping ϕ 7→ ϕσ such
that, for any given 4LQS-formula ϕ, ϕσ is the 4LQS-formula obtained from ϕ by
replacing the free occurrences of the variables x1 , . . . , xn in ϕ with the variables
y1 , . . . , yn , respectively. We say that a substitution σ is free for ϕ if the formulae
ϕ and ϕσ have exactly the same occurrences of quantified variables.
     A 4LQS-interpretation is a pair M = (D, M ) where D is a non-empty collec-
tion of objects (called domain or universe of M) and M is an assignment over
the variables in Vi , for i = 0, 1, 2, 3, such that: M X 0 ∈ D, M X 1 ∈ P(D), M X 2 ∈
P(P(D)), M X 3 ∈ P(P(P(D))), where X i ∈ Vi , for i = 0, 1, 2, 3, and P(s) de-
notes the powerset of s.
Pair terms are interpreted à la Kuratowski, and therefore we put
                        M hx, yi := {{M x}, {M x, M y}}.
Next, let
  - M = (D, M ) be a 4LQS-interpretation,
  - x1 , . . . , xn ∈ V0 , X11 , . . . , Xm 1
                                              ∈ V1 , X12 , . . . , Xp2 ∈ V2 , and
  - u1 , . . . , un ∈ D, U1 , . . . , Um ∈ P(D), U12 , . . . , Up2 ∈ P(P(D)).
                            1             1


By M[~x/~u, X  ~ 1 /U
                    ~ 1, X~ 2 /U
                               ~ 2 ], we denote the interpretation M0 = (D, M 0 ) such
that M xi = ui (for i = 1, . . . , n), M 0 Xj1 = Uj1 (for j = 1, . . . , m), M 0 Xk2 =
         0

Uk2 (for k = 1, . . . , p), and which otherwise coincides with M on all remaining
variables. For a 4LQS-interpretation M = (D, M ) and a 4LQS-formula ϕ, the
satisfiability relationship M |= ϕ is defined inductively over the structure of
ϕ as follows. Quantifier-free atomic formulae are evaluated in a standard way
according to the usual meaning of the predicates ‘∈’ and ‘=’, and purely universal
formulae are evaluated as follows:
  - M |= (∀z1 ) . . . (∀zn )ϕ0 iff M[~z/~u] |= ϕ0 , for all ~u ∈ Dn ;
                                        ~ 1 /U                     ~ 1 ∈ P(D) m ;
                                             ~ 1 ] |= ϕ1 , for all U
                                                                             
  - M |= (∀Z11 ) . . . (∀Zm1
                              )ϕ1 iff M[Z
                                        ~ 2 /U                     ~ 2 ∈ P(P(D)) p .
                                             ~ 2 ] |= ϕ2 , for all U
                                                                                
  - M |= (∀Z12 ) . . . (∀Zp2 )ϕ2 iff M[Z
Finally, compound formulae are interpreted according to the standard rules of
propositional logic. If M |= ϕ, then M is said to be a 4LQS-model for ϕ. A
4LQS-formula is said to be satisfiable if it has a 4LQS-model. A 4LQS-formula is
valid if it is satisfied by all 4LQS-interpretations.
    We are now ready to present the fragment 4LQSR of 4LQS of our interest.
This is the collection of the formulae ψ of 4LQS fulfilling the restrictions:
 1. for every purely universal formula (∀Z11 ) . . . (∀Zm
                                                        1
                                                          )ϕ1 of level 2 occurring in
    ψ and every purely universal formula (∀z1 ) . . . (∀zn )ϕ0 of level 1 occurring
    negatively in ϕ1 , ϕ0 is a propositional combination of quantifier-free atomic
    formulae of level 0 and the condition
                                           Vn Vm
                                      ¬ϕ0 → i=1 j=1 zi ∈ Zj1
    is a valid 4LQS-formula (in this case we say that (∀z1 ) . . . (∀zn )ϕ0 is linked
    to the variables Z11 , . . . , Zm
                                    1
                                      );
 2. for every purely universal formula (∀Z12 ) . . . (∀Zp2 )ϕ2 of level 3 in ψ:
                Conjunctive Query Answering via a Fragment of Set Theory         27

      - every purely universal formula of level 1 occurring negatively in ϕ2 and
        not occurring in a purely universal formula of level 2 is only allowed to
        be of the form
                                                   n V
                                                     n
                                                       hzi , zj i = Yij2 ),
                                                   V
                             (∀z1 ) . . . (∀zn )¬(
                                               i=1 j=1

        with Yij2 ∈ V 2 , for i, j = 1, . . . , n;
      - purely universal formulae (∀Z11 ) . . . (∀Zm
                                                   1
                                                     )ϕ1 of level 2 may occur only
        positively in ϕ2 .
    Restriction 1 has been introduced for technical reasons concerning the decid-
ability of the satisfiability problem for the fragment, while restriction 2 allows
one to define binary relations and several operations on them (for space reasons
details are not included here but can be found in [2]).
    The semantics of 4LQSR plainly coincides with that of 4LQS.


2.2   The logic DLh4LQSR,× i(D)
                                                                            4,×
The description logic DLh4LQSR,× i(D) (more simply referred to as DLD           ) is
                                     R                      4
the extension of the logic DLh4LQS i(D) (for short DLD ) presented in [7] in
which Boolean operations on concrete roles and the product of concepts are
                                                 4,×
admitted. Analogously to DL4D , the logic DLD         supports concept constructs
such as full negation, union and intersection of concepts, concept domain and
range, existential quantification and min cardinality on the left-hand side of
inclusion axioms, role constructs such as role chains on the left hand side of
inclusion axioms, union, intersection, and complement of roles, and properties
on roles such as transitivity, symmetry, reflexivity, and irreflexivity.
                                                                             4,×
    As far as the construction of role inclusion axioms is concerned, DLD         is
more liberal than SROIQ(D) [12] (the logic underlying the most expressive
Ontology Web Language 2 profile, OWL 2 DL [16]), since the roles involved are
not required to be subject to any ordering relationship, and the notion of simple
role is not needed. DL4,×
                       D treats derived datatypes by admitting datatype terms
constructed from data ranges by means of a finite number of applications of
the Boolean operators. Basic and derived datatypes can be used inside inclusion
axioms involving concrete roles.
    Datatypes are defined according to [14] as follows. Let D = (ND , NC , NF , ·D )
be a datatype map, where ND is a finite set of datatypes, NC is a map assigning
a set of constants NC (d) to each datatype d ∈ ND , NF is a map assigning a
set of facets NF (d) to each d ∈ ND , and ·D is a map assigning (i) a datatype
interpretation dD to each datatype d ∈ ND , (ii) a facet interpretation f D ⊆ dD
to each facet f ∈ NF (d), and (iii) a data value eD    d ∈ d
                                                              D
                                                                 to every constant
ed ∈ NC (d). We shall assume that the interpretations of the datatypes in ND
are non-empty pairwise disjoint sets.
    A facet expression for a datatype d ∈ ND is a formula ψd constructed from
the elements of NF (d) ∪ {>d , ⊥d } by applying a finite number of times the
connectives ¬, ∧, and ∨. The function ·D is extended to facet expressions for
28          D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

d ∈ ND by putting >D         D       D            D      D    D          D
                       d = d , ⊥d = ∅, (¬f ) = d \ f , (f1 ∧ f2 ) = f1 ∩ f2 ,
                                                                                D      D
              D      D    D
and (f1 ∨ f2 ) = f1 ∪ f2 , for f, f1 , f2 ∈ NF (d).
    A data range dr for D is either a datatype d ∈ ND , or a finite enumeration
of datatype constants {ed1 , . . . , edn }, with edi ∈ NC (di ) and di ∈ ND , or a facet
expression ψd , for d ∈ ND , or their complementation.
    Let RA , RD , C, Ind be denumerable pairwise disjoint sets of abstract role
names, concrete role names, concept names, and individual names, respectively.
We assume that the set of abstract role names RA contains a name U denoting
the universal role.
(a) DL4,×                     4,×                     4,×
       D -datatype, (b) DLD -concept, (c) DLD -abstract role, and (d) DLD -
                                                                                     4,×

concrete role terms are constructed according to the following syntax rules:
(a) t1 , t2 −→ dr | ¬t1 | t1 u t2 | t1 t t2 | {ed } ,
(b) C1 , C2 −→ A | > | ⊥ | ¬C1 | C1 t C2 | C1 u C2 | {a} | ∃R.Self |∃R.{a}|∃P.{ed } ,
(c) R1 , R2 −→ S | U | R1− | ¬R1 | R1 t R2 | R1 u R2 | RC1 | | R|C1 | RC1 | C2 | id(C) |
    C1 × C2 ,
(d) P1 , P2 −→ T | ¬P1 | P1 t P2 | P1 u P2 | PC1 | | P|t1 | PC1 |t1 ,
where dr is a data range for D, t1 , t2 are data-type terms, ed is a constant in
NC (d), a is an individual name, A is a concept name, C1 , C2 are DL4,×D -concept
terms, S is an abstract role name, R, R1 , R2 are DL4,×
                                                    D   -abstract role terms, T is
                                             4,×
a concrete role name, and P, P1 , P2 are DLD -concrete role terms.
   A DL4,×
         D -knowledge base is a triple KB = (R, T , A) such that R is a DLD -
                                                                               4,×
                  4,×                    4,×
RBox, T is a DLD -TBox, and A a DLD -ABox (see next).
   A DL4,×D -RBox is a collection of statements of the following forms:
 R1 ≡ R2 , R1 v R2 , R1 . . . Rn v Rn+1 , Sym(R1 ), Asym(R1 ), Ref(R1 ), Irref(R1 ),
Dis(R1 , R2 ), Tra(R1 ), Fun(R1 ), R1 ≡ C1 × C2 , P1 ≡ P2 , P1 v P2 , Dis(P1 , P2 ), Fun(P1 ),
where R1 , R2 are DL4,×                                     4,×
                     D -abstract role terms, C1 , C2 are DLD -abstract concept
                          4,×
terms, and P1 , P2 are DLD -concrete role terms. Any expression of the type
w v R, where w is a finite string of DL4,× D -abstract role terms and R is an
   4,×
DLD -abstract role term is called a role inclusion axiom (RIA).
   Next, a DL4,×
               D -TBox is a set of statements of the types:
C1 ≡ C2 , C1 v C2 , C1 v ∀R1 .C2 , ∃R1 .C1 v C2 , ≥nR1 .C1 v C2 , C1 v ≤nR1 .C2 ,
t1 ≡ t2 ,     t1 v t2 ,   C1 v ∀P1 .t1 ,   ∃P1 .t1 v C1 ,   ≥nP1 .t1 v C1 ,   C1 v ≤nP1 .t1 ,
where C1 , C2 are DL4,×                                                       4,×
                      D -concept terms, t1 , t2 datatype terms, R1 a DLD -abstract
                           4,×
role term, and P1 a DLD -concrete role term. Any statement C v D, with C
and D DL4,×  D -concept terms, is a general concept inclusion axiom (GCI).
     Finally, a DL4,×
                   D -ABox is a set of individual assertions of the forms: a : C1 ,
                                                                                        4,×
(a, b) : R1 , a = b, ed : t1 , (a, ed ) : P1 , with a, b individual names, C1 a DLD         -
concept term, R1 a DL4,× D  -abstract    role term, d a datatype, e d a constant in N C (d),
t1 a datatype term, and P1 a DL4,×        D -concrete role term. As mentioned above,
     4,×
DLD is more liberal than SROIQ(D) in the construction of role inclusion
axioms. For example, the role hierarchy {RS v S, RT v R, V T v T, V S v V }
presented in [12] is expressible in DL4,×   D , but not in SROIQ(D).
                   Conjunctive Query Answering via a Fragment of Set Theory                  29

    The semantics of DL4,×                                    I       I
                         D is based on interpretations I = (∆ , ∆D , · ), where ∆
                                                                                  I
                                                       D
and ∆D are non-empty disjoint domains such that d ⊆ ∆D , for every d ∈ ND ,
and ·I is an interpretation function. The interpretation of concepts and roles,
and of axioms and assertions is illustrated in [3, Table 1].
    Let KB = (R, T , A) be a DL4,×  D -knowledge base. An interpretation I =
(∆ , ∆D , ·I ) is a D-model of R (and write I |=D R) if I satisfies each axiom in
  I

R according to the semantic rules in [3, Table 1]. Similar definitions hold for T
and A too. Then I satisfies KB (and write I |=D KB) if it is a D-model of R, T ,
and A. A knowledge base is consistent if it is satisfied by some interpretation.


3    Conjunctive Query Answering for DL4,×
                                       D

Let V = {v1 , v2S , . . .} be a denumerable and infinite set of variables disjoint from
                                                    4,×
Ind and from {NC (d) : d ∈ ND }. A DLD                  -atomic formula is an expression of
of the following types
              R(w1 , w2 ), P (w1 , u1 ), C(w1 ), w1 = w2 , u1 = u2 ,
                                                    S                                       4,×
where w1 , w2 ∈ V ∪ Ind, u1 , u2 ∈ V ∪ {NC (d) : d ∈ ND }, R is a DLD                           -
                                     4,×                                           4,×
abstract role term, P is a DLD -concrete role term, and C is a DLD -concept
term. A DL4,×  D -atomic formula containing no variables is said to be closed. A
DL4,×
    D  -literal is a DL4,×                                                     4,×
                            D -atomic formula or its negation. A DLD -conjunctive
                                     4,×
query isS a conjunction of DLD -literals. Let v1 , . . . , vn ∈ V and o1 , . . . , on ∈
Ind ∪ {NC (d) : d ∈ ND }. A substitution σ := {v1 /o1 , . . . , vn /on } is a map
such that, for every DL4,×     D -literal L, Lσ is obtained from L by replacing the
occurrences of v1 , . . . , vn in L with o1 , . . . , on , respectively. Substitutions can be
extended to DL4,×  D -conjunctive queries in the usual way. Let Q := (L1 ∧ . . . ∧ Lm )
be a DLD -conjunctive query, and KB a DL4,×
          4,×
                                                        D -knowledge base. A substitution
σ involving exactly the variables occurring in Q is a solution for Q w.r.t. KB
if there exists a DL4,×   D -interpretation I such that I |=D KB and I |=D Qσ. The
collection Σ of the solutions for Q w.r.t. KB is the answer set of Q w.r.t. KB.
Then the conjunctive query answering (CQA) problem for Q w.r.t. KB consists
in finding the answer set Σ of Q w.r.t. KB.
    We shall solve the CQA problem just stated by reducing it to the analo-
gous problem formulated in the context of the fragment 4LQSR (and in turn
to the decision procedure for 4LQSR presented in [2]). The CQA problem for
4LQSR -formulae can be stated as follows. Let φ be a 4LQSR -formula and let ψ
be a conjunction of 4LQSR -quantifier-free atomic formulae of level 0 of the types
x = y, x ∈ X 1 , hx, yi ∈ X 3 or their negations, such that Var0 (ψ) ∩ Var0 (φ) = ∅
and Var1 (ψ) ∪ Var3 (ψ) ⊆ Var1 (φ) ∪ Var3 (φ). The CQA problem for ψ w.r.t. φ
consists in computing the answer set of ψ w.r.t. φ, namely the collection Σ 0 of
all the substitutions σ 0 := {x1 /y1 , . . . , xn /yn } (where x1 , . . . , xn are the distinct
variables of level 0 in ψ and {y1 , . . . , yn } ⊆ Var0 (φ)) such that M |= φ ∧ ψσ 0 ,
for some 4LQSR -interpretation M. In view of the decidability of the satisfia-
bility problem for 4LQSR -formulae, the CQA problem for 4LQSR -formulae is
decidable as well. Indeed, given two 4LQSR -formulae φ and ψ satisfying the
30         D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

above requirements, to compute the answer set of ψ w.r.t. φ, for each candi-
date substitution σ 0 := {x1 /y1 , . . . , xn /yn } (with {x1 , . . . , xn } = Var0 (ψ) and
{y1 , . . . , yn } ⊆ Var0 (φ)) one has just to test for satisfiability the 4LQSR -formula
φ ∧ ψσ 0 . Since the number of possible candidate substitutions is |Var0 (φ)||Var0 (ψ)|
and the satisfiability problem for 4LQSR -formulae is the decidable, it follows that
the answer set of ψ w.r.t. φ can be computed effectively. Summarizing,
Lemma 1. The CQA problem for 4LQSR -formulae is decidable.                                 t
                                                                                           u

   The following theorem states that also the CQA problem for DL4,×
                                                                D is decid-
able.
Theorem 1. Given a DL4,×                             4,×
                        D -knowledge base KB and a DLD -conjunctive query
Q, the CQA problem for Q w.r.t. KB is decidable.

Proof (sketch). For space reasons we just outline the main ideas of the proof.
The interested reader, however, can find complete details in the extended version
of this paper (see [3]).
    As remarked above, the CQA problem for DL4,×  D can be solved via an effective
reduction to the CQA problem for 4LQSR -formulae, and then exploiting Lemma 1.
The reduction is accomplished through a function θ that maps effectively variables
in V and individuals in Ind into variables of sort 0 (of the 4LQSR -language),
etc., DL4,×                                            4,×
          D -TBoxes, -RBoxes, and -ABoxes, and DLD -conjunctive queries into
      R
4LQS -formulae in conjunctive normal form (CNF), which can be used to map
                                       4,×
effectively CQA problems from the DLD      -context into the 4LQSR -context. More
                         4,×                               4,×
specifically, given a DLD -knowledge base KB and a DLD         -conjunctive query Q,
using the function θ we can effectively construct the following 4LQSR -formulae
in CNF:                                V12
                                                      ψQ := θ(Q) .2
                         V
                  φKB := H∈KB θ(H) ∧ i=1 ξi ,
Then, if we denote by Σ the answer set of Q w.r.t. KB and by Σ 0 the answer
set of ψQ w.r.t. φKB , we have that Σ consists of all substitutions σ (involving
exactly the variables occurring in Q) such that θ(σ) ∈ Σ 0 . Since, by Lemma 1,
Σ 0 can be computed effectively, then Σ can be computed effectively too.          t
                                                                                  u


4      A tableau-based procedure

In this section, we illustrate a KE-tableau based procedure that, given a 4LQSR -
                                       4,×
formula φKB corresponding to a DLD         -knowledge base and a 4LQSR -formula
2
     The definition of the function θ is inspired to that of the function τ introduced in the
     proof of Theorem 1 in [7]. Specifically, θ differs from τ as (i) it allows quantification
     only on variables of level 0, (ii) it treats Boolean operations on concrete roles and
     the product of concepts, and (iii) it constructs 4LQSR -formulae in CNF. In addition,
     the constraints ξ1 –ξ12 are similar to the constraints ψ1 –ψ12 introduced in the proof
     of Theorem 1 in [7]; they are introduced to guarantee that each model of φKB can
                                  ×
     be transformed into a DL4,  D -interpretation. Details of the construction of θ and of
     ξ1 –ξ12 can be found in [3].
                   Conjunctive Query Answering via a Fragment of Set Theory               31

ψQ corresponding to a DL4,×      D -conjunctive query Q, yields all the substitutions
σ = {x1 /y1 , . . . , xn /yn }, with {x1 , . . . , xn } = Var0 (ψQ ) and {y1 , . . . , yn } ⊆
Var0 (φKB ), belonging to the answer set Σ 0 of ψQ w.r.t. φKB .
    Let φKB be the formula obtained from φKB by:
  - moving universal quantifiers in φKB as inwards as possible according to the
    rule (∀z)(A(z) ∧ B(z)) ←→ ((∀z)A(z) ∧ (∀z)B(z)),
  - renaming universally quantified variables so as to make them pairwise dis-
    tinct.
    Let F1 , . . . , Fk be the conjuncts of φKB that are 4LQSR -quantifier-free atomic
formulae and S1 , . . . , Sm the conjuncts of φKB that are 4LQSR -purely universal
formulae. For every Si = (∀z1i ) . . . (∀zni i )χi , i = 1, . . . , m, we put
                                                      Si {z1i /xa1 , . . . , zni i /xani }.
                                     V
            Exp(Si ) :=
                             {xa1 ,...,xan }⊆Var0 (φKB )
                                          i
                                            m
                                            S
Let ΦKB := {Fj : i = 1, . . . , k} ∪               Exp(Si ).
                                             i=1
    To prepare for the KE-tableau based procedure to be described next, we
introduce some useful notions and notations (see [10] for a detailed overview of
KE-tableau, an optimized variant of semantic tableaux).
    Let Φ = {C1 , . . . , Cp } be a collection of disjunctions of 4LQSR -quantifier-free
atomic formulae of level 0 of the types: x = y, x ∈ X 1 , hx, yi ∈ X 3 . T is a
KE-tableau for Φ if there exists a finite sequence T1 , . . . , Tt such that (i) T1 is
a one-branch tree consisting of the sequence C1 , . . . , Cp , (ii) Tt = T , and (iii)
for each i < t, Ti+1 is obtained from Ti by an application of one of the rules in
Fig 1. The set of formulae Siβ = {β 1 , . . . , β n } \ {β i } occurring as premise in the
E-rule contains the complements of all the components of the formula β with
the exception of the component βi .

            β1 ∨ . . . ∨ βn         Siβ
                                          E-Rule                         PB-Rule
                        βi                                     A | A
            where Siβ := {β 1 , ..., β n } \ {β i },           with A a literal
            for i = 1, ..., n


                        Fig. 1. Expansion rules for the KE-tableau.


    Let T be a KE-tableau. A branch ϑ of T is closed if it contains both A and
¬A, for some formula A. Otherwise, the branch is open. A formula β1 ∨ . . . ∨ βn is
fulfilled in a branch ϑ, if βi is in ϑ, for some i = 1, . . . , n. A branch ϑ is complete
if every formula β1 ∨ . . . ∨ βn occurring in ϑ is fulfilled. A KE-tableau is complete
if all its branches are complete.
    Next we introduce the procedure Saturate-KB that takes as input the set
                                                                           4,×
ΦKB constructed from a 4LQSR -formula φKB representing a DLD                   -knowledge
base KB as shown above, and yields a complete KE-tableau TKB for ΦKB .
Procedure 1 Saturate-KB(ΦKB )
 1. TKB := ΦKB ;
32       D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

 2. Select an open branch ϑ of TKB that is not yet complete.
    (a) Select a formula β1 ∨ . . . ∨ βn on ϑ that is not fulfilled.
     (b) If Sjβ is in ϑ, for some j ∈ {1, . . . , n}, apply the E-Rule to β1 ∨ . . . ∨ βn and
         Sjβ on ϑ and go to step 2.
    (c) If Sjβ is not in ϑ, for every j = 1, . . . , n, let B β be the collection of formulae
        β 1 , . . . , β n present in ϑ and let β h be the lowest index formula such that β h ∈
        {{β 1 , . . . , β n } \ B β }, then apply the PB-rule to β h on ϑ, and go to step 2.
 3. Return TKB .
    Soundness of Procedure 1 can be easily proved in a standard way and its com-
pleteness can be shown much along the lines of Proposition 36 in [10]. Concerning
termination of Procedure 1, our proof is based on the following two facts. The
rules in Fig. 1 are applied only to non-fulfilled formulae on open branches and
tend to reduce the number of non-fulfilled formulae occurring on the considered
branch. In particular, when the E-Rule is applied on a branch ϑ, the number of
non-fulfilled formulae on ϑ decreases. In case of application of the PB-Rule on
a formula β = β1 ∨ . . . ∨ βn on a branch, the rule generates two branches. In
one of them the number of non-fulfilled formulae decreases (because β becomes
fulfilled). In the other one the number of non-fulfilled formulae stays constant
but the subset B β of {β 1 , . . . , β n } occurring on the branch gains a new element.
Once |B β | gets equal to n − 1, namely after at most n − 1 applications of the
PB-rule, the E-rule is applied and the formula β = β1 ∨ . . . ∨ βn becomes fulfilled,
thus decrementing the number of non-fulfilled formulae on the branch. Since the
number of non-fulfilled formulae on each open branch gets equal to zero after a
finite number of steps and the rules of Fig. 1 can be applied only to non-fulfilled
formulae on open branches, the procedure terminates.
    By the completeness of Procedure 1, each open branch ϑ of TKB induces a
4LQSR -interpretation Mϑ such that Mϑ |= ΦKB . We define Mϑ = (Dϑ , Mϑ )
as follows. We put Dϑ := {x ∈ V0 : x occurs in ϑ}; Mϑ x := x, for every x ∈ Dϑ ;
Mϑ XC1 = {x : x ∈ XC1 is in ϑ}, for every XC1 ∈ V1 occurring ϑ; Mϑ XR       3
                                                                              = {hx, yi :
            3                            3
hx, yi ∈ XR is in ϑ}, for every XR ∈ V3 occurring in ϑ. It is easy to check that
Mϑ |= φKB and thus, plainly, that Mϑ |= φKB .
    Next, we provide some complexity results. Let r be the maximum number
of universal quantifiers in Si , and k := |Var0 (φKB )|. Then, each Si generates k r
expansions. Since the knowledge base contains m such formulae, the number of
disjunctions in the initial branch of the KE-tableau is m · k r . Next, let ` be the
maximum number of literals in Si , for i = 1, . . . , m. Then, the maximum depth
of the KE-tableau, namely the maximum size of the models of ΦKB constructed
as illustrated above, is O(`mk r ) and the number of leaves of the tableau, that is
                                                      r
the number of such models of ΦKB , is O(2`mk ).
    We now describe a procedure that, given a KE-tableau constructed by Proce-
dure 1 and a 4LQSR -formula ψQ representing a DL4,×       D -conjunctive query Q, yields
all the substitutions σ 0 in the answer set Σ 0 of ψQ w.r.t. φKB . By the soundness
of Procedure 1, we can limit ourselves to consider only the models Mϑ of φKB
induced by each open branch ϑ of TKB . For every open and complete branch ϑ
                    Conjunctive Query Answering via a Fragment of Set Theory                       33

of TKB , we construct a decision tree Dϑ such that every maximal branch of Dϑ
defines a substitution σ 0 such that Mϑ |= ψQ σ 0 .
    Let d be the number of literals in ψQ . Dϑ is a finite labelled tree of depth
d + 1 whose labelling satisfies the following conditions, for i = 0, . . . , d: (i) every
node of Dϑ at level i is labelled with (σi , ψQ σi ), and, in particular, the root is
labelled with (σ00 , ψQ σ00 ), where σ00 is the empty substitution; (ii) if a node at
level i is labelled with (σi0 , ψQ σi0 ), then its s-successors, with s > 0, are labelled
with σi0 %q1i +1 , ψQ (σi0 %q1i +1 ) , . . . , σi0 %qsi +1 , ψQ (σi0 %qsi +1 ) , where qi+1 is the (i +
                                                                             

1)-st conjunct of ψQ σi0 and Sqi+1 = {%q1i +1 , . . . , %qsi +1 } is the collection of the
substitutions % = {x1 /y1 , . . . , xj /yj } with {x1 , . . . , xj } = Var0 (qi+1 ) such that
p = qi+1 %, for some literal p on ϑ. If s = 0, the node labelled with (σi0 , ψQ σi0 ) is
a leaf node and, if i = d, σi0 is added to Σ 0 .
    Let δ(TKB ) and λ(TKB ) be, respectively, the maximum depth of TKB and
the number of leaves of TKB computed above. Plainly, δ(TKB ) = O(`mk r ) and
                     r
λ(TKB ) = O(2`mk ). It is easy to verify that s = 2k is the maximum branching
of Dϑ . Since Dϑ is a s-ary tree of depth d + 1, where d is the number of literals
in ψQ , and the s-successors of a node are computed in O(δ(TKB )) time, the
number of leaves in Dϑ is O(s(d+1) ) = O(2k(d+1) ) and they are computed in
O(2k(d+1) δ(TKB )) time. Finally, since we have λ(TKB ) of such decision trees,
the answer set of ψQ w.r.t. φKB is computed in O(2k(d+1) δ(TKB )λ(TKB )) =
                               r                                     r
O(2k(d+1) · `mk r · 2`mk ) = O(`mk r 2k(d+1)+`mk ) time. Since the size of φKB
and of ψQ are polynomially related to those of KB and of Q, respectively (see [3]
for details), the construction of the answer set of Q with respect to KB can
be done in double-exponential time. In case KB contains no role chain axioms
and qualified cardinality restrictions, the complexity of our CQA problem is in
EXPTIME, since the maximum number of universal quantifiers in φKB , namely
r, is a constant (in particular r = 3). We remark that such result is comparable
with the complexity of the CQA problem for a large family of description logics
such as SHIQ [15]. In particular, the CQA problem for the very expressive
description logic SROIQ turns out to be 2-NEXPTIME-complete.


5     Conclusions

We have introduced the description logic DLh4LQSR,× i(D) (DL4,×D , for short) that
extends the logic DLh4LQSR i(D) with Boolean operations on concrete roles and
with the product of concepts. We addressed the problem of Conjunctive Query
                                       4,×
Answering for the description logic DLD    by formalizing DL4,×
                                                            D -knowledge bases
         4,×
and DLD -conjunctive queries in terms of formulae of 4LQSR . Such formalization
seems to be promising for implementation purposes.
    In our approach, we first constructed a KE-tableau TKB for φKB , a 4LQSR -
formalization of a given DL4,×
                             D -knowledge base KB, whose branches induce the
models of φKB . Then we computed the answer set of a 4LQSR -formula ψQ , repre-
senting a DL4,×
              D -conjunctive query Q, with respect to φKB by means of a forest
of decision trees based on the branches of TKB and gave some complexity results.
34      D. Cantone, M. Nicolosi-Asmundo, D.F. Santamaria

   We plan to generalize our procedure with a data-type checker in order to
extend reasoning with data-types, and also to extend 4LQSR with data-type
groups. We also intend to improve the efficiency of the knowledge base saturation
algorithm and query answering algorithm, and to extend the expressiveness of the
queries. Finally, we intend to study a parallel model of the procedure described
and to provide an implementation of it.


References

 1. D. Calvanese, G. D. Giacomo, and M. Lenzerini, “On the decidability of query
    containment under constraints,” in Proc. of the 17th ACM SIGACT-SIGMOD-
    SIGART Symp. on Princ. of Database Systems, ser. PODS ’98. New York, NY,
    USA: ACM, 1998, pp. 149–158.
 2. D. Cantone and M. Nicolosi-Asmundo, “On the satisfiability problem for a 4-level
    quantified syllogistic and some applications to modal logic,” Fundamenta Informat-
    icae, vol. 124, no. 4, pp. 427–448, 2013.
 3. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria, “Conjunctive Query
    Answering via a Fragment of Set Theory (Extended Version),” CoRR, vol.
    abs/1606.07337, 2016.
 4. D. Cantone and C. Longo, “A decidable two-sorted quantified fragment of set theory
    with ordered pairs and some undecidable extensions,” Theor. Comput. Sci., vol.
    560, pp. 307–325, 2014.
 5. D. Cantone, C. Longo, and M. Nicolosi-Asmundo, “A decision procedure for a two-
    sorted extension of multi-level syllogistic with the Cartesian product and some map
    constructs,” in Proc. of the 25th Italian Conference on Computational Logic (CILC
    2010), Rende, Italy, July 7-9, 2010, W. Faber and N. Leone, Eds., vol. 598. CEUR
    Workshop Proceedings, ISSN 1613-0073, June 2010, pp. 1–18 (paper 11).
 6. D. Cantone, C. Longo, and M. Nicolosi-Asmundo, “A decidable quantified fragment
    of set theory involving ordered pairs with applications to description logics,” in Proc.
    Computer Science Logic, 20th Annual Conf. of the EACSL, CSL 2011, September
    12-15, 2011, Bergen, Norway, 2011, pp. 129–143.
 7. D. Cantone, C. Longo, M. Nicolosi-Asmundo, and D. F. Santamaria, “Web ontology
    representation and reasoning via fragments of set theory,” in Web Reasoning and
    Rule Systems - 9th International Conference, RR 2015, Berlin, Germany, August
    4-5, 2015, Proc., LNCS vol. 9209, Springer, ISBN 978-3-319-22001-7 pp. 61–76.
 8. D. Cantone, C. Longo, and A. Pisasale, “Comparing description logics with multi-
    level syllogistics: the description logic DLhMLSS× 2,m i,” in 6th Workshop on Semantic
    Web Applications and Perspectives (Bressanone, Italy, Sep. 21-22, 2010), P. Traverso,
    Ed., 2010, pp. 1–13.
 9. D. Cantone, M. Nicolosi-Asmundo, D. F. Santamaria, and F. Trapani “Ontoceramic:
    an owl ontology for ceramics classification,” in Proc. of the 30th Italian Conf.
    on Computational Logic, CILC 2015, July 1-3 2015 Genova, CEUR Electronic
    Workshop Proceedings, vol. 1459, pp. 122–127.
10. M. D’Agostino, “Tableau methods for classical propositional logic,” in Handbook of
    Tableau Methods, M. D’Agostino, D. M. Gabbay, R. Hähnle, and J. Posegga, Eds.
    Springer, 1999, pp. 45–123.
11. N. Dershowitz and J.-P. Jouannaud, “Rewrite systems,” in Handbook of Theoretical
    Computer Science (Vol. B), J. van Leeuwen, Ed. Cambridge, MA, USA: MIT
    Press, 1990, pp. 243–320.
                 Conjunctive Query Answering via a Fragment of Set Theory            35

12. I. Horrocks, O. Kutz, and U. Sattler, “The even more irresistible SROIQ,” in Proc.
    10th Int. Conf. on Princ. of Knowledge Representation and Reasoning, (Doherty,
    P. and Mylopoulos, J. and Welty, C. A., eds.). AAAI Press, 2006, pp. 57–67.
13. U. Hustadt, B. Motik, and U. Sattler, “Data complexity of reasoning in very ex-
    pressive description logics,” in IJCAI-05, Proc. of the 19th Inter. Joint Conf. on
    Art. Intell., Edinburgh, Scotland, UK, July 30-August 5, 2005, 2005, pp. 466–471.
14. B. Motik and I. Horrocks, “OWL datatypes: Design and implementation,” in Proc.
    of the 7th Int. Semantic Web Conference (ISWC 2008), ser. LNCS, vol. 5318.
    Springer, October 26–30 2008, pp. 307–322.
15. M. Ortiz, R. Sebastian, and M. Šimkus, “Query answering in the horn fragments of
    the description logics shoiq and sroiq,” in Proc. of the Twenty-Second International
    Joint Conference on Artificial Intelligence - Vol.Two, ser. IJCAI’11. AAAI Press,
    2011, pp. 1039–1044.
16. World Wide Web Consortium, “OWL 2 Web ontology language structural specifi-
    cation and functional-style syntax (second edition).”