A set-based reasoner
                for the description logic DL4,×
                                            D


             Domenico Cantone, Marianna Nicolosi-Asmundo, and
                       Daniele Francesco Santamaria

        University of Catania, Dept. of Mathematics and Computer Science
              email: {cantone,nicolosi,santamaria}@dmi.unict.it


      Abstract. We present a KE-tableau-based implementation of a rea-
      soner for a decidable fragment of (stratified) set theory expressing the
                                                  ×
      description logic DLh4LQSR,× i(D) (DL4,   D , for short). Our application
                                                                       ×
      solves the main TBox and ABox reasoning problems for DL4,       D . In par-
                                                         ×
      ticular, it solves the consistency problem for DL4,
                                                       D   -knowledge  bases rep-
      resented in set-theoretic terms, and a generalization of the Conjunctive
      Query Answering problem in which conjunctive queries with variables
      of three sorts are admitted. The reasoner, which extends and optimizes
                                                                    ×
      a previous prototype for the consistency checking of DL4,   D -knowledge
                                                                  4,×
      bases (see [7]), is implemented in C++. It supports DLD -knowledge
      bases serialized in the OWL/XML format, and it admits also rules ex-
      pressed in SWRL (Semantic Web Rule Language).


1   Introduction
A wealth of decidability results has been collected over the years within the re-
search field of Computable Set Theory [2, 11, 17]. However, only recently some
of these results have been applied in the context of knowledge representation
and reasoning for the semantic web. Such efforts have been motivated by the
characteristics of the set-theoretic fragments considered, as they provide very
expressive unique formalisms that combine the modelling capabilities of a rule
language with the constructs of description logics. The decidable multi-sorted
quantified set-theoretic fragment 4LQSR [3] is appropriate in this sense, in con-
sideration of the fact that its decision procedure is efficiently implementable. We
recall that the language of 4LQSR involves variables of four sorts, pair terms,
and a restricted form of quantification.
    In [5], the theory 4LQSR has been used to represent the expressive description
logic DL4,×
          D by means of a suitable translation mapping. Moreover, decidability of
the most widespread reasoning problems for DL4,×  D , such as the consistency prob-
lem and the Conjunctive Query Answering (CQA) problem for DL4,×        D -knowledge
bases (KBs) were proved via a reduction to the satisfiability problem for 4LQSR .
Since 4LQSR admits variables of four sorts, the CQA problem was generalized
in such a way as to admit queries over three sorts of variables. Such a general-
ization, called Higher-Order Conjunctive Query Answering (HOCQA) problem
can be instantiated to the most widespread reasoning tasks for DL4,×    D -ABox.
    The description logic DL4,× D admits Boolean operators on concepts and ab-
stract roles, concept domain and range, and existential and minimum cardinality
restriction on the left-hand side of inclusion axioms. It also supports role chains
on the left-hand side of inclusion axioms and properties on roles such as transi-
tivity, symmetry, and reflexivity. In [4], its consistency problem has been shown
to be NP-complete under not very restrictive constraints. Such a low complexity
result depends on the fact that existential quantification cannot appear on the
right-hand side of inclusion axioms. Nonetheless, DL4,×   D turns out to be more
expressive than other low complexity logics such as OWL RL [16] and there-
fore it is very suitable for representing real-world ontologies. For instance, the
restricted version of DL4,×
                          D mentioned above allows one to express several OWL
ontologies, such as ArcheOntology [16] and OntoCeramic [10], for the classification
of archaeological finds, and ArchivioMuseoFabbrica [1], concerning the renovation
of the Monastery of San Nicola l’Arena in Catania by the architect Giancarlo
De Carlo. Since existential quantification is admitted only on the left-hand side
of inclusion axioms, DL4,×
                         D is less expressive than logics such as SROIQ(D) [13]
as long as the generation of new individuals is concerned. On the other hand,
DL4,×
    D is more liberal than SROIQ(D) in the definition of role inclusion ax-
ioms, as the roles involved in DL4,×
                                  D are not subject to any ordering relationship,
and the notion of simple role is not needed. For example, the role hierarchy pre-
sented in [13, page 2] is not expressible in SROIQ(D), but can be represented
in DL4,×                     4,×
        D . In addition, DLD is a powerful rule language able to express rules
with negated atoms such as
            Person(?p) ∧ ¬hasHome(?p, ?h) =⇒ HomelessPerson(?p)
that are not supported by the SWRL language.
    In [7], we presented a first effort to implement in C++ a KE-tableau-based
decision procedure for the consistency problem of DL4,×  D -KBs, by resorting to
the algorithm introduced in [5]. The choice of KE-tableau systems [14], instead
of traditional semantic tableaux [19], was motivated by the fact that KE-tableau
systems introduce an analytic cut rule which permits to construct trees whose
branches define mutually exclusive situations, thus avoiding the proliferation of
redundant branches, typical of Smullyan’s semantic tableaux [19]. Thus, given as
input a consistent KB, the procedure yields a KE-tableau whose open branches
induce distinct models of the KB. Otherwise, a closed KE-tableau is returned.
    In this contribution we improve the reasoner presented in [7] by introducing a
system called KEγ -tableau which admits a generalization of the KE-elimination
rule incorporating the γ-rule, namely the expansion rule for handling universally
quantified formulae. The reasoner also includes a procedure to compute the
HOCQA problem for DL4,×   D . Finally, through suitable benchmark tests, we show
that such a novel reasoner is more efficient than the one introduced in [7].
2     Preliminaries
2.1     The set-theoretic fragment
We summarize the set-theoretic notions underpinning the description logic DL4,×            D
and its reasoning tasks. For the sake of conciseness, we avoid to report here the
syntax and semantics of the whole 4LQSR theory (the interested reader can find it
in [3] together with the decision procedure for its satisfiability problem). Thus, we
focus on the 4LQSR -formulae de facto involved in the set-theoretic representation
of DL4,×                                                          R
        D , namely propositional combinations of 4LQS -literals (atomic formulae
                                  R
or their negations) and 4LQS purely universal formulae of the types displayed
in Table 1. The class of such 4LQSR -formulae is called 4LQSRDL .                   4,×
                                                                                    D

    We recall that the fragment 4LQSR admits four collections, Vari , of variables
of sort i denoted by X i , Y i , Z i , . . ., for i = 0, 1, 2, 3 (variables of sort 0 are also
denoted by x, y, z, . . .). Besides variables, also pair terms of the form hx, yi, with
x, y ∈ Var0 , are allowed. Since the types of formulae displayed in Table 1 do not
contain variables of sort 2, here we limit ourselves only to notions and definitions
relative to 4LQSRDL -formulae involving variables of sorts 0, 1, and 3.
                            4,×
                            D


               Literals of level 0                  Purely universal quantified formulae of level 1
                        1            3  (∀z1 ) . . . (∀zn )ϕ0 , where z1 , . . . , zn ∈ Var0 and ϕ0 is
        x = y, x ∈ X , hx, yi ∈ X
                                                      any propositional combination of
                     1               3
    ¬(x = y), ¬(x ∈ X ), ¬(hx, yi ∈ X )                        literals of level 0.


             Table 1. Types of literals and quantified formulae admitted in 4LQSRDL4,× .
                                                                                          D


    The variables z1 , . . . , zn are said to occur quantified in (∀z1 ) . . . (∀zn )ϕ0 . A
variable occurs free in a 4LQSRDL -formula ϕ if it does not occur quantified in
                                            4,×
                                            D

any subformula of ϕ. For i = 0, 1, 3, we denote with Vari (ϕ) the collections of
variables of sort i occurring free in ϕ.
    Given sequences of distinct variables x (in Var0 ), X 1 (in Var1 ), and X 3
(in Var3 ), of length n, m, and q, respectively, and sequences of (not necessarily
distinct) variables y (in Var0 ), Y 1 (in Var1 ), and Y 3 (in Var3 ), also of length n,
m, and q, respectively, the 4LQSRDL -substitution σ := {x/y, X 1 /Y 1 , X 3 /Y 3 }
                                                  4,×
                                                  D

is the mapping ϕ 7→ ϕσ such that, for any given universal quantified 4LQSRDL -                           4,×
                                                                                                         D

formula ϕ, ϕσ is the result of replacing in ϕ the free occurrences of the variables
xi in x (for i = 1, . . . , n) with the corresponding yi in y, of Xj1 in X 1 (for
j = 1, . . . , m) with Yj1 in Y 1 , and of Xh3 in X 3 (for h = 1, . . . , q) with Yh3 in Y 3 ,
respectively. A substitution σ is free for ϕ if the formulae ϕ and ϕσ have exactly
the same occurrences of quantified variables. The empty substitution, denoted ,
satisfies ϕ = ϕ, for each 4LQSRDL -formula ϕ.
                                            4,×
                                            D

    A 4LQSRDL -interpretation is a pair M = (D, M ), where D is a nonempty
                  4,×
                  D

collection of objects (called domain or universe of M) and M is an assignment
over the variables in Vari , for i = 0, 1, 3, such that:
                   M X 0 ∈ D, M X 1 ∈ P(D), M X 3 ∈ P(P(P(D))),
where X i ∈ Vari , for i = 0, 1, 3, and P(s) denotes the powerset of s.
Pair terms are interpreted à la Kuratowski, and therefore we put
                        M hx, yi := {{M x}, {M x, M y}}.
Next, let
  - M = (D, M ) be a 4LQSRDL -interpretation,
                                 4,×
                                 D

  - x1 , . . . , xn ∈ Var0 , and
  - u1 , . . . , un ∈ D.
By M[x/u], we denote the interpretation M0 = (D, M 0 ) such that M 0 xi = ui
(for i = 1, . . . , n), and which otherwise coincides with M on all remaining vari-
ables. For a 4LQSRDL -interpretation M = (D, M ) and a formula ϕ, the satisfi-
                     4,×
                     D

ability relationship M |= ϕ is recursively defined over the structure of ϕ as fol-
lows. Literals are evaluated in a standard way, based on the usual interpretation
of the predicates ‘∈’ and ‘=’, and of the propositional negation ‘¬’. Compound
formulae are interpreted according to the standard rules of propositional logic.
Finally, purely universal formulae are evaluated as follows:
  - M |= (∀z1 ) . . . (∀zn )ϕ0 iff M[z/u] |= ϕ0 , for all u ∈ Dn .
    If M |= ϕ, then M is said to be a 4LQSRDL -model for ϕ. A 4LQSRDL -
                                                      4,×
                                                      D
                                                                                4,×
                                                                                D

formula is said to be satisfiable if it has a 4LQSRDL -model. A 4LQSRDL -formula
                                                     4,×
                                                     D
                                                                        4,×
                                                                        D

is valid if it is satisfied by all 4LQSRDL -interpretations.
                                        4,×
                                        D


2.2   The logic DLh4LQSR,× i(D)

It is convenient to recall the main notions and definitions concerning the de-
scription logic DLh4LQSR,× i(D) (also called DL4,× D ) [4].
    Let RA , RD , C, and Ind be denumerable pairwise disjoint sets of abstract
role names, concrete role names, concept names, and individual names, respec-
tively. We assume that the set of abstract role names RA contains a name U
denoting the universal role.
    Data types are introduced through the notion of data type maps, defined ac-
cording to [15] as follows. A data type map is a quadruple D = (ND , NC , NF , ·D ),
where ND is a finite set of data types, NC is a function assigning a set of con-
stants NC (d) to each data type d ∈ ND , NF is a function assigning a set of
facets NF (d) to each d ∈ ND , and ·D is (i) a function assigning a data type
interpretation dD to each data type d ∈ ND , (ii) a facet interpretation f D ⊆ dD
to each facet f ∈ NF (d), and (iii) a data value eD    d ∈ d
                                                             D
                                                                to every constant
ed ∈ NC (d). Facets determine subsets of data values considered of interest in
a specific application domain. We shall assume that the interpretations of the
data types in ND are nonempty pairwise disjoint sets.
(a) DL4,×                      4,×                 4,×
        D -data types, (b) DLD -concepts, (c) DLD -abstract roles, and (d) DLD -
                                                                                4,×

concrete role terms are defined according to the DL standard notation (see [13])
as follows:

(a) t1 , t2 −→ dr | ¬t1 | t1 u t2 | t1 t t2 | {ed } ,
(b) C1 , C2 −→ A | > | ⊥ | ¬C1 | C1 tC2 | C1 uC2 | {a} | ∃R.Self |∃R.{a}|∃P.{ed } ,
(c) R1 , R2 −→ S | U | R1− | ¬R1 | R1 tR2 | R1 uR2 | RC1 | | R|C1 | RC1 | C2 | id(C) |
    C1 × C2 ,
(d) P1 , P2 −→ T | ¬P1 | P1 t P2 | P1 u P2 | PC1 | | P|t1 | PC1 |t1 ,

where dr is a data range for D, t1 , t2 are data type terms, ed is a constant in
NC (d), a is an individual name, A is a concept name, C1 , C2 are DL4,×
                                                                     D -concept
                                                    4,×
terms, S is an abstract role name, R, R1 , R2 are DLD -abstract role terms, T is
a concrete role name, and P, P1 , P2 are DL4,×
                                             D -concrete role terms. Notice that
data type terms are intended to represent derived data types.
   A DL4,×                                                      4,×
          D -KB is a triple K = (R, T , A) such that R is a DLD -RBox, T is a
   4,×                     4,×
DLD -TBox, and A a DLD -ABox.
   A DL4,×D -RBox is a collection of statements of the following types:

      R1 ≡ R2 ,    R1 v R 2 ,      R1 . . . Rn v Rn+1 , Sym(R1 ), Asym(R1 ),
      Ref(R1 ),    Irref(R1 ),       Dis(R1 , R2 ),      Tra(R1 ),     Fun(R1 ),
    R1 ≡ C1 × C2 , P1 ≡ P2 ,            P 1 v P2 ,      Dis(P1 , P2 ), Fun(P1 ),

where R1 , R2 are DL4,×                                        4,×
                      D -abstract role terms, C1 , C2 are DLD -abstract concept
                            4,×
terms, and P1 , P2 are DLD -concrete role terms. Any expression of the type
R1 . . . Rn v R, where R1 , . . . , Rn , R are DL4,×
                                                 D -abstract role terms, is called a
role inclusion axiom (RIA).
    A DL4,×D -T Box is a set of statements of the types:

  - C1 ≡ C2 , C1 v C2 , C1 v ∀R.C2 , ∃R.C1 v C2 , ≥nR.C1 v C2 , C1 v ≤nR.C2 ,
  - t1 ≡ t2 , t1 v t2 , C1 v ∀P.t1 , ∃P.t1 v C1 , ≥nP.t1 v C1 , C1 v ≤nP.t1 ,

where C1 , C2 are DL4,×                                               4,×
                     D -concept terms, t1 , t2 data type terms, R a DLD -abstract
                         4,×
role term, and P a DLD -concrete role term. Statements of the form C v D,
where C and D are DL4,×  D -concept terms, are general concept inclusion axioms.
    A DL4,×
         D  -ABox   is a set of individual assertions of the forms:

         a : C1 , (a, b) : R1 ,   a = b,   a 6= b,   ed : t1 ,   (a, ed ) : P1 ,

with C1 a DL4,×D -concept term, d a data type, t1 a data type term, R1 a DLD -
                                                                                 4,×
                             4,×
abstract role term, P1 a DLD -concrete role term, a, b individual names, and ed
a constant in NC (d).
    The semantics of DL4,×                                                   I       I
                          D is given via interpretations of the form I = (∆ , ∆D , · ),
         I                                                        D
where ∆ and ∆D are nonempty disjoint domains such that d ⊆ ∆D , for every
d ∈ ND , and ·I is an interpretation function. The interpretation of concepts and
roles, axioms and assertions is defined in [8, Table 2].
    Let R, T , and A be as above. An interpretation I = (∆I , ∆D , ·I ) is a D-model
of R (resp., T ), and we write I |=D R (resp., I |=D T ), if I satisfies each axiom
in R (resp., T ) according to the semantic rules in [8, Table 2]. Analogously,
I = (∆I , ∆D , ·I ) is a D-model of A, and we write I |=D A, if I satisfies each
assertion in A, according to [8, Table 2]. A DL4,×
                                                 D -KB K = (A, T , R) is consistent
if there exists a D-model I = (∆I , ∆D , ·I ) of A, T , and R.
                                                 ×
The HOCQA problem for DL4,                      D . We recall that the problem of Higher-
Order Conjuctive Query Answering (HOCQA) for DL4,×                     D , introduced in [5], is a
generalization of the Conjunctive Query Answering problem for DL4,×                   D defined
in [4]. The HOCQA problem for DL4,×             D  relies on the notion   of Higher-Order  (HO)
DL4,×
    D   -conjunctive     query,    admitting     variables of three   sorts: individual and data
type variables, concept variables, and role variables. The HOCQA problem for
DL4,×
    D consists in finding the HO answer set of an HO-DLD -conjunctive query
                                                                           4,×
                                             4,×
(see below) with respect to a DLD -KB.
    Specifically, let Vi = {v1 , v2 , . . .}, Ve = {e1, e2, . . .}, Vd = {t1, t2, . . .}, Vc =
{c1 , c2 , . . .}, Var = {r1 , r2 , . . .}, and Vcr = {p1 , p2 , . . .} S
                                                                        be pairwise disjoint de-
numerably infinite sets of variables disjoint from Ind, {NC (d) : d ∈ ND }, C,
RA , and RD . HO-DL4,×      D -atomic formulae are expressions of the following types:

     R(w1 , w2 ), P (w1 , u), C(w1 ), t(u), r(w1 , w2 ), p(w1 , u), c(w1 ), t(u), w1 = w2 ,
where w1 , w2 ∈ Vi ∪Ind, u ∈ Ve ∪ {NC (d) : d ∈ ND }, R is a DL4,×
                                                S
                                                                                               D -abstract role
term, P is a DL4,×      D  -concrete      role  term,   C    is  a   DL 4,×
                                                                        D    -concept      term,     t is a DL4,×D -
                                                                                           4,×
data type term, r ∈ Var , p ∈ Vcr , c ∈ Vc , t ∈ Vd . A HO-DLD -atomic formula
containing no variables is said to be ground. A HO-DL4,×                         D -literal is a HO-DLD -
                                                                                                                 4,×
                                                             4,×
atomic formula or its negation. A HO-DLD -conjunctive query is a conjunction
of HO-DL4,×    D -literals. We denote with λ the empty HO-DLD -conjunctive query.
                                                                                       4,×

     Let v1 , . . . , vn ∈ Vi , e1 , . . . , eg ∈ Ve , t1 , . . . , tl ∈ Vd ,c1 , . .S. , cm ∈ Vc , r1 , . . . , rk ∈
Var , p1 , . . . , ph ∈ Vcr , o1 , . . . , on ∈ Ind, ed1 , . . . , edg ∈ {NC (d) : d ∈ ND },
C1 , . . . , Cm ∈ C, R1 , . . . , Rk ∈ RA , and P1 , . . . , Ph ∈ RD . A substitution
σ := {v1 /o1 , . . . , vn /on , e1 /ed1 , . . . , eg /edg , t1 /t1 , . . . , tl /tl , c1 /C1 , . . . , cm /Cm ,
  r1 /R1 , . . . , rk /Rk , p1 /P1 , . . . , ph /Ph } is a map such that, for every HO-DL4,×                     D -
literal L, Lσ is obtained from L by replacing: (a) the occurrences of vi in L with
oi , for i = 1, . . . , n; (b) the occurrences of eb in L with db , for b = 1, . . . , g; (c)
the occurrences of ts in L with ts , for s = 1, . . . , l; (d) the occurrences of cj in L
with Cj , for j = 1, . . . , m; (e) the occurrences of r` in L with R` , for ` = 1, . . . , k;
(f) the occurrences of pt in L with Pt , for t = 1, . . . , h. Substitutions can be
extended to HO-DL4,×         D -conjunctive queries in the usual way.
     Let Q := (L1 ∧ . . . ∧ Lm ) be a HO-DL4,×            D -conjunctive query, and KB a DLD -
                                                                                                                 4,×

KB. A substitution σ involving exactly the variables occurring in Q is a solution
for Q w.r.t. KB, if there exists a DL4,×            D -interpretation I such that I |=D KB and
I |=D Qσ. The collection Σ of the solutions for Q w.r.t. KB is the higher-order
answer set of Q w.r.t. KB. Then the higher-order conjunctive query answering
problem for Q w.r.t. KB consists in finding the HO answer set Σ of Q w.r.t. KB.
     The HOCQA problem for DL4,×                 D can be instantiated to the most significant
ABox reasoning problems for DL4,×                D (see [5]).


                     ×
Representing DL4,  D     in set-theoretic terms. DL4,×   D -KBs and HO-DLD -
                                                                            4,×

conjunctive queries can be represented in set-theoretic terms by exploiting a
mapping θ defined in [5]. The function θ translates DL4,×                  R
                                                      D statements in 4LQSDL -                                   4,×
                                                                                                                 D

formulae in CNF. Specifically, θ maps injectively individuals a, constants ed ∈
NC (d), variables w ∈ Vi , and variables u ∈ Ve into sort 0 variables xa , xed , xw ,
xu , the constant concepts > and ⊥, data type terms t, concept terms C, c ∈ Vc ,
                                        1     1
and t ∈ Vd into sort 1 variables X>       , X⊥  , Xt1 , XC1 , Xc1 , Xt1 respectively, and the
universal relation U , abstract role terms R, concrete role terms P , r ∈ Var , and
p ∈ Vcr into sort 3 variables XU3 , XR    3
                                            , XP3 , Xr3 , Xp3 , respectively.1
The mapping θ is defined for HO-DL4,×        D -atomic formulae as follows:
                                    3
θ(R(w1 , w2 )) := hxw1 , xw2 i ∈ XR   , θ(P (w1 , u)) := hxw1 , xu i ∈ XP3 , θ(C(w1 )) :=
xw1 ∈ XC1 , θ(t(u)) := xu ∈ Xt1 , θ(w1 = w2 ) := xw1 = xw2 , θ(t(u)) := xu ∈
Xt1 , θ(c(w1 )) := xw1 ∈ Xc1 , θ(r(w1 , w2 )) := hxw1 , xw2 i ∈ Xr3 , θ(p(w1 , u)) :=
hxw1 , xu i ∈ Xp3 .
Finally, θ is extended to HO-DL4,×     D -conjunctive queries and to substitutions in
a standard way.
     From now on we denote with φKB the 4LQSRDL translation of a DL4,×
                                                               4,×
                                                               D                       D -KB
                                R                                             4,×
KB and with ψQ the 4LQSDL -formula representing the HO-DLD -conjunctive
                                    4,×
                                    D

query Q. The formula φKB is a conjunction of 4LQSRDL -formulae of type
                                                                     4,×
                                                                     D

(∀z1 ) . . . (∀zn )ϕ0 , with ϕ0 a clause of 4LQSRDL -literals, since (a) each DL4,×
                                                         4,×
                                                         D                                D -
                                                                       R
KB KB is a set of statements H such that θ(H) is a 4LQSDL -formula in Table
                                                                      4,×
                                                                      D

1; and (b) φKB is constructed by conjoining the θ(H)s, moving universal quan-
tifiers as inward as possible, and renaming quantified variables as to be pairwise
distinct. The interested reader is referred to [6] for full details.
     Finally, the HOCQA problem for 4LQSRDL -formulae can be stated as follows.
                                                   4,×
                                                   D

Let ψ be a conjunction of 4LQSRDL -literals and φ a 4LQSRDL -formula. The
                                          4,×
                                          D
                                                                            4,×
                                                                            D

HOCQA problem for ψ w.r.t. φ consists in computing the HO answer set of ψ
w.r.t. φ, namely the collection Σ 0 of all the substitutions σ 0 such that M |=
φ ∧ ψσ 0 , for some 4LQSRDL -interpretation M.
                              4,×
                              D


3     Overview of the reasoner

We present a general overview of the reasoner and the main notions and defini-
tions concerning the procedures upon which it is based.
    The input of the reasoner is an OWL ontology serialized in the OWL/XML
syntax and admitting SWRL rules (see Figure 1).


                       Fig. 1. Execution cycle of the reasoner.
1
    The use of level 3 variables to model abstract and concrete role terms is motivated by
    the fact that their elements, that is ordered pairs hx, yi, are encoded in Kuratowski’s
    style as {{x}, {x, y}}, namely as collections of sets of objects.
    If the ontology meets the DL4,×
                                 D requirements, a parser produces the internal
coding of all axioms and assertions of the ontology in set-theoretic terms, as a
list of strings. Then the system builds the data-structures required to execute
the algorithm. In the two subsequent steps, the reasoner constructs a complete
KEγ -tableau TKB whose open branches represent all possible models for the
input KB φKB (see below for the definition of KEγ -tableau). The tableau TKB
is constructed (1) by systematically applying the following two rules: (1a) a
generalization of the KE-elimination rule incorporating the γ-rule, and (1b) the
principle of bivalence rule (PB-rule) (thus constructing all branches of the KEγ -
tableau—see Figure 2), and then (2) processing each open branch ϑ of TKB
by constructing the equivalence classes of the individuals involved in formulae
of type x = y occurring in ϑ and substituting each individual x on ϑ with the
representative of the equivalence class of x. Such step returns the complete KEγ -
tableau. Finally, the reasoner takes as input the internal coding of ψQ , i.e. the
set-theoretic representation of a query Q, and computes the HO-answer set of
ψQ with respect to φKB . The task of computing the complete KEγ -tableau for
φKB is performed by procedure Consistency-DL4,×    D illustrated in [8, Figure 8],
whereas the task of computing the HOCQA answer set of a given query w.r.t
the KB is performed by procedure HOCQAγ -DL4,×      D shown in [8, Figure 9].

   Let ΦKB := {φ : φ is a conjunct of φKB }. The procedure Consistency-DL4,×D
constructs a complete KEγ -tableau TKB for the set ΦKB of the conjuncts of φKB
representing the saturation of the DL4,×
                                     D -KB.
      At this point, it is convenient to give the definition of KEγ -tableaux. Let Φ :=
{C1 , . . . , Cp }, where each Ci is either a 4LQSRDL -literal of the types in Table 1 or
                                                       4,×
                                                       D

a 4LQSRDL -purely universal quantified formula of the form (∀x1 ) . . . (∀xm )(β1 ∨
           4,×
           D

. . . ∨ βn ), with β1 . . . βn 4LQSRDL -literals. T is a KEγ -tableau for Φ if there
                                        4,×
                                        D

exists a finite sequence T1 , . . . , Tt such that (i) T1 is a one-branch tree consisting
of the sequence C1 , . . . , Cp , (ii) Tt = T , and (iii) for each i < t, Ti+1 is obtained
from Ti either by an application of one of the rules (Eγ -rule or PB-rule) in
Figure 2 or by applying a substitution σ to a branch ϑ of Ti (in particular, the
substitution σ is applied to each formula X of ϑ and the resulting branch will be
denoted with ϑσ). In the definition of the Eγ -rule reported in Figure 2, (a) τ :=
{x1 /xo1 . . . xm /xom } is a substitution such that x1 , . . . , xm are the quantified
variables in ψ and xo1 , . . . , xom ∈ Var0 (φKB ); and (b) S β i τ := {β 1 τ, . . . , β n τ } \
{β i τ } is a set containing the complements of all the disjuncts β1 . . . βn to which
the substitution τ is applied, with the exception of the disjunct βi .
    Initially, the procedure Consistency-DL4,×       D constructs a one-branch KE -
                                                                                         γ

tableau TKB for the set ΦKB of conjuncts of φKB . Then, it expands TKB by
systematically applying the Eγ -rule and the PB-rule in Figure 2 to formulae of
type ψ = (∀x1 ) . . . (∀xm )(β1 ∨ . . . ∨ βn ) till they are all fulfilled, giving priority
to the Eγ -rule. Once such rules are no longer applicable, for each open branch ϑ
of the resulting KEγ -tableau, atomic formulae of type x = y occurring in ϑ are
used to compute the equivalence class of x and y. For each open branch ϑ of TKB ,
the equivalence class of each variable occurring in ϑ is obtained by computing
the substitution σϑ such that ϑσϑ does not contain literals of type x = y, for
distinct x, y. The resulting pair (ϑ, σϑ ) is added to the set E.


        ψ        S βi τ γ
                       E -rule                                    PB-rule
              βi τ                                        A | A
        where                                             where A is a literal
        ψ := (∀x1 ) . . . (∀xm )(β1 ∨ . . . ∨ βn ),
        τ := {x1 /xo1 . . . xm /xom },
        and S β i τ := {β 1 τ, ..., β n τ } \ {β i τ },
        for i = 1, ..., n

                     Fig. 2. Expansion rules for the KEγ -tableau.

    The procedure HOCQAγ -DL4,×     D takes as input a query ψQ and the set E
yielded by the procedure Consistency-DL4,×    D and returns the answer set Σ of
                                                                                  0

ψQ w.r.t. φKB . For each open and complete branch ϑ of TKB , the procedure
HOCQAγ -DL4,×   D builds a decision tree Dϑ where each maximal branch induces
a substitution σ 0 such that σϑ σ 0 belongs to the answer set of ψQ w.r.t. to φKB .
    Dϑ is obtained by constructing a stack of its nodes. Initially the stack contains
just the root node (, ψQ σϑ ) of Dϑ , with  the empty substitution. At each step,
the procedure pops out from the stack an element (σ 0 , ψQ σϑ σ 0 ) and iteratively
selects a literal q from the query ψQ σϑ σ 0 and eliminates it from ψQ σϑ σ 0 . Then,
the set of literals t in ϑ matching q is computed by putting Litϑq := {t ∈ ϑ : t =
qρ, for some substitution ρ}. The successors of the current node are computed
by pushing the node (σ 0 ρ, ψQ σϑ σ 0 ρ) in the stack, for each element in Litϑq . If
the current node has the form of (σ 0 , λ), with λ the empty query, the last literal
of ψQ has been treated and the substitution σϑ σ 0 is inserted in Σ 0 . Notice that,
in case of a failing query match, the set Litϑq is empty and then no successor
node is pushed into the stack. Thus, the failing branch of Dϑ is abandoned and
another branch is selected by popping one of the nodes of Dϑ from the stack.
    Computational complexity results can be found in [9].


3.1   Some implementation details

We first show how the internal coding of DL4,×      D -KBs is represented in terms of
      R
4LQSDL -formulae and the data-structures used by the reasoner for representing
        4,×
        D

formulae, nodes, and how KEγ -tableaux are implemented. Then we describe the
most relevant functions that implement the procedures Consistency-DL4,×        D and
         γ     4,×                                                        4,×
HOCQA -DLD and also illustrate an example of reasoning in DLD .
    To begin with, 4LQSRDL -variables, quantifiers, Boolean operators, set-theoretic
                                 4,×
                                 D
                                                                                  i
relators, and pairs are mapped into strings as follows. Variables of type Xname
are mapped into strings of the form Vi {name}. For the sake of uniformity, vari-
ables of sort 0 are denoted with X 0 , Y 0 , . . ., whereas individuals a, concepts C,
and roles R of a DL4,×
                     D -KB are respectively mapped into the variables Xa , XC ,
                                                                                0   1
        3
and XR    , according to the function θ described in [5]. The symbols ∀, ∧, ∨, ¬∧,
¬∨ are mapped into the strings $FA, $AD, $OR, $DA, $RO, respectively. The re-
lators ∈, 6∈, =, 6= are mapped into the strings $IN, $NI, $EQ, $QE, respectively.
A pair hX10 , X20 i is mapped into the string $OA V01 $CO V02 $AO, where $OA
represents the bracket “h”, $AO the bracket “i”, and $CO the comma symbol.
    Then, data-structures for representing the KB are built. 4LQSRDL -variables
                                                                          4,×
                                                                          D

are implemented by means of the class Var that has four fields. The field type
of type integer indicates the sort of the variable, the field name of type string
represents the name of the variable, and the field var of type integer represents
a free variable if set to 0, and a quantified variable if set to 1. The field index
stores the position of the variable in the vector VVL, delegated to collect free
variables. Quantified and free variables are collected in the vectors VQL and VVL
respectively, which provide a subvector for each sort of variable.
    The operators admitted in 4LQSRDL , internally coded as strings, are mapped
                                         4,×
                                         D

into three vectors that are fields of the class Operator. Specifically, the vector
boolOp contains the values $OR, $AD, $RO, $DA, the vector setOp the values $IN,
$EQ, $NI, $QE, $OA, $AO, $CO, and the vector qutOp the value $FA.
    4LQSRDL -literals are stored using the class Lit that has two fields. The field
            4,×
            D

litOp of type integer represents the operator of the formula and corresponds
to the index of one of the first four elements of the vector setOp. The field
components is a vector whose elements point to the variables involved in the
literal and stored in VQL and VVL.
    4LQSRDL -formulae are represented by the class Formula, having a binary
            4,×
            D

tree structure, whose nodes contain objects of the class Lit. The left and right
children contain the left and right subformula, respectively. The class Formula
contains the following fields: the field lit of type pointer to Lit represents the lit-
eral; the field operand represents the propositional operator, and its value is the
index of the corresponding element of the vector boolOp; the field psubformula
of type pointer to Formula is the pointer to the father node, whereas the fields
lsubformula and rsubformula contain the pointers to the nodes representing
the left and the right component of the formula, respectively.
    The procedure Consistency-DL4,×   D is based on the data-structure implemented
by the class Tableau. This class uses the instances of the class Node that repre-
sents the nodes of the KEγ -tableau. The class Node has a tree-shaped structure
and four fields: the field setFormula, of type vector of Formula, that collects
the formulae of the current node, and three pointers to instances of the class
Node. These are the leftchild, rightchild, and father fields, which point to
the left child node, right child node, and father node, respectively.
    The root node of the class Tableau contains the field root of type pointer
to Node. The fields openbranches and closedbranches collect the set of open
branches and of closed branches, respectively. In addition, the class Tableau
is provided with the field EqSet that is a three-dimensional vector of integers
storing the equivalence classes induced by atomic formulae of type X 0 = Y 0 . In
particular, EqSet stores a vector containing the indices of VVL corresponding to
the variables belonging to the equivalence classes.
     As mentioned above, the reasoner takes as input an OWL ontology compat-
ible with the DL4,×D requirements, also admitting SWRL rules, and serialized in
the OWL/XML syntax. As first step, the function readOWLXMLontology pro-
duces the internal coding of all axioms and assertions of the ontology, yielding a
list of strings. Then the reasoner builds from the output of readOWLXMLontology
the objects of type Formula that implement the 4LQSRDL -formulae represent-
                                                            4,×
                                                            D

ing the KB, and stores them in the field root of an object of type Tableau.
In this phase, formulae are transformed in CNF and universal quantifiers are
moved as inward as possible and renamed in such a way as to be pairwise dis-
tinct. The object of type Tableau representing the KEγ -tableau is the input
to the procedure expandGammaTableau that expands the KEγ -tableau by iter-
atively selecting and fulfilling purely universal quantified input formulae. Once
a purely universal quantified formula has been selected, expandGammaTableau
builds iteratively the set of substitutions τ to be applied to the selected formula.
A substitution τ is a map from the indices of the quantified variables of the
formula, selected in order of appearance, to the elements of the vector VVL. The
implementation of τ applies standard techniques for computing the variations
with repetition of the set of indices of the elements of VVL taken to k by k, where
k is the number of quantified variables occurring in the selected formula.
    The procedure expandGammaTableau fulfills the formula selected by system-
atically applying the functions EGrule with the current τ and PBrule, respec-
tively implementing the Eγ -rule and the PB-rule. More precisely, it works as
follows. The disjuncts of the current formula to which τ is applied are stored in
a temporary vector and selected iteratively. If a disjunct has its negation on the
branch, it is removed from the temporary vector. Once all the elements of the
temporary vector have been selected, if the last one does not have its negation
on the branch, then EGrule is applied to the formula and the last element of
the temporary vector is inserted in the branch according to Figure 2. If there is
more than one element left in the temporary vector, then the procedure PBrule
is applied. In case the stack is empty, a contradiction is found and the branch
gets closed and inserted in the vector closedbranches.
    If the procedure expandGammaTableau terminates with some elements in
openbranches, then the reasoner builds the set of equivalence classes of the vari-
ables involved in formulae of type X 0 = Y 0 , for each element of openbranches
by means of the procedure buildsEqSet. The latter procedure updates the field
EqSet of the object of type Tableau with the new information concerning the
set of equivalence classes. After the execution of buildsEqSet, if openbranches
contains some elements, a consistent KB is returned.
    Procedure HOCQAγ -DL4,×  D is implemented by the function performQuery
that takes as input the object of type Tableau returned by buildsEqSet and
a string representing the internal coding of the input query ψQ , and returns an
object of type QueryManager storing, among other information, the answer set of
ψQ w.r.t. φKB . The function performQuery uses an object of type QueryManager
that stores the input query ψQ as a string, an object of type Formula representing
ψQ , and the answer set of ψQ w.r.t. φKB , for each element of openbranches. The
answer set is implemented by endowing the object of type QueryManager with
the pair of vectors VarMatch. The first vector of VarMatch contains an integer for
each element in openbranches: this is set to 1 if the corresponding branch has
solution, 0 otherwise. The second (three-dimensional) vector contains for each
element in openbranches a vector of solutions, each one constituted by a vector
of pairs of pointers to Var. The first Var of such pair is a variable belonging to
the query, whereas the second Var is the matched individual.
    For each element in openbranches, the function performQuery implements
a decision tree by means of a stack that keeps track of the partial solutions of the
query, as nodes of the decision tree. Such a stack, called matchSet, is constituted
by a vector of pairs of objects of type Var such that the first one represents the
query variable and the second one the matched element. Initially, matchSet is
empty. At first step, the procedure selects the first conjunct of the query and,
for each match found, it pushes in matchSet a vector of pairs representing the
match. The procedure selects iteratively the conjuncts of the query and then
applies to the selected conjunct the substitution that is currently at the top of
matchSet. If the literal obtained by the application of such partial solution has
one or more matches in the branch, the resulting substitutions are pushed in
matchSet. Once all the literals of the query have been processed, if matchSet is
not empty, it contains the leaves of the maximal branches of the decision tree,
which are all added to VarMatch.


                                   ×
3.2     Example of reasoning in DL4,
                                  D


Let us consider the ontology displayed in Figure 3.


                        Fig. 3. OWL ontology of the example.

      The KB in terms of 4LQSRDL      4,×
                                      D
                                            is the following formula:
                                           3
            φKB := ¬(hxEva , xAnn i ∈ XM     other ) ∧
                                        3                                3
                   hxAnn , xAnn i ∈ XRelative ∧ hxEva , xEva i ∈ XRelative      ∧
                                                 3                         3
                   (∀z1 )(∀z2 )(¬(hz1 , z2 i ∈ XM other ) ∨ hz1 , z2 i ∈ XRelative )

                          3
   Let ψQ = hz, xEva i ∈ XM other be a query represented in set-theoretic terms.
A complete KEγ -tableau for φKB and the decision trees constructed for the
evaluation of ψQ on each open branch of the KEγ -tableau are shown in Figure
4. Notice that, the decision tree constructed on the leftmost open branch of the
KEγ -tableau provides no solution.


      Fig. 4. KEγ -tableau for φKB and decision trees for the evaluation of ψQ .

    The internal representation of the OWL ontology is shown in Figure 5. The
KEγ -tableau computed by the reasoner and the evaluation of the query ψQ are
reported in Figure 6. Finally, Figure 7 shows a performance comparison between
our implementation of the KE-tableau presented in [7] and the KEγ -tableau
system for DL4,×
               D presented in this paper. The metric used in the benchmarking
is the number of models of the input KB computed by the reasoners and the
time required to compute such models. As shown in Figure 7, the KEγ -tableau
has a better performance than the KE-tableau up to about 400%, even if in some
cases the performances of the two systems are comparable, as shown in the plot.
We conclude that the KEγ -tableau system is always convenient, also because the
expansions of quantified formulae are not stored in memory.


                   Fig. 5. Internal representation of the ontology.


   Fig. 6. The KEγ -tableau and the evaluation of ψQ computed by the reasoner.
        Fig. 7. Comparison between KE-tableau and KEγ -tableau systems.


4   Conclusions
We presented a C++ implementation of a KEγ -tableau system for the most
widespread reasoning tasks of DL4,×                                       4,×
                                   D , such as consistency checking of DLD -KBs
                                                   4,×
and a generalization of the CQA problem for DLD , , called HOCQA problem,
admitting conjunctive queries with variables of three sorts. These problems have
been addressed by translating DL4,×                                4,×
                                    D -KBs and higher-order DLD -conjunctive
queries in terms of formulae of the set-theoretic fragment 4LQSRDL . The rea-
                                                                     4,×
                                                                     D

soner is an improvement of the KE-tableau system introduced in [7] to check
consistency of DL4,×
                   D -KBs, as it admits a generalization of the KE-elimination
rule incorporating the γ-rule. The reasoner takes as input OWL ontologies com-
patible with the specifications of DL4,×
                                      D serialized in the OWL/XML format and
admitting SWRL rules.
    Finally, we showed that the reasoner presented in this paper is more efficient
than the one introduced in [7], by means of suitable benchmark test sets.
    We plan to extend the set-theoretic fragment underpinning the reasoner to
include also a restricted version of the operator of relational composition. This
will allow ones to reason with description logics that admit full existential and
universal quantification. In addition, we intend to improve our reasoner so as
to deal with the reasoning problem of ontology classification. Then, we shall
compare the resulting reasoner with existing well-known reasoners such as Her-
miT [12] and Pellet [18], providing some benchmarking. We also plan to allow
data type reasoning by either integrating existing solvers for the Satisfiability
Modulo Theories (SMT) problem or by designing ad-hoc new solvers. Finally, as
each branch of the KEγ -tableau can be computed by a single processing unit, we
plan to implement a parallel version of the software by using the Nvidia CUDA
library.

References
 1. C. Cantale, D. Cantone, M. Nicolosi-Asmundo, and D.F. Santamaria. Dis-
    tant Reading Through Ontologies: The Case Study of Catania’s Benedictines
    Monastery. JIS.it, 8,3 (September 2017).
 2. D. Cantone, A. Ferro, and E. G. Omodeo. Computable set theory. Number 6 in
    International Series of Monographs on Computer Science, Oxford Science Publi-
    cations. Clarendon Press, Oxford, UK, 1989.
 3. D. Cantone and M. Nicolosi-Asmundo. On the satisfiability problem for a 4-level
    quantified syllogistic and some applications to modal logic. Fundamenta Informat-
    icae, 124(4):427–448, 2013.
 4. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. Conjunctive query
    answering via a fragment of set theory. In Proc. of ICTCS 2016, Lecce, September
    7-9, CEUR-WS Vol. 1720, pp. 23–35.
 5. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-theoretic approach
    to ABox reasoning services. In Costantini S., Franconi E., Van Woensel W.,
    Kontchakov R., Sadri F., Roman D. Rules and Reasoning. RuleML+RR 2017.,
    Lecture Notes in Computer Science, vol 10364. Springer, 2017.
 6. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-theoretic approach
    to ABox reasoning services. CoRR, 1702.03096, 2017. Extended version.
 7. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A C++ reasoner for
                               ×
    the description logic DL4,D . In Proc.of CILC 2017, 26-29 September 2017, Naples,
    Italy. CEUR WS, ISSN 1613-0073, Vol. 1949, pp. 276-280., 2017.
 8. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-based reasoner
                                   ×
    for the description logic DL4,D . CoRR, 1805.08606, 2018. Extended version.
 9. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. An optimized KE-
                                                                         ×
    tableau-based system for reasoning in the description logic DL4,    D .    CoRR,
    1804.11222, 2018. Extended version.
10. D. Cantone, M. Nicolosi-Asmundo, D. F. Santamaria, and F. Trapani. Onto-
    ceramic: an OWL ontology for ceramics classification. In Proc. of CILC 2015,
    CEUR-WS, vol. 1459, pp. 122–127, Genova, July 1-3, 2015.
11. D. Cantone, E. Omodeo, and A. Policriti. Set theory for computing: from decision
    procedures to declarative programming with sets. Monographs in Computer Science.
    Springer-Verlag, New York, NY, USA, 2001.
12. B. Glimm, I. Horrocks, B. Motik, G. Stoilos, and Z. Wang. HermiT: An OWL 2
    Reasoner. Journal of Automated Reasoning, 53(3):245–269, 2014.
13. I. Horrocks, O. Kutz, and U. Sattler. The even more irresistible SROIQ. In Proc.
    10th Int. Conf. on Princ. of Knowledge Representation and Reasoning, (Doherty,
    P. and Mylopoulos, J. and Welty, C. A., eds.), pages 57–67. AAAI Press, 2006.
14. M. Mondadori M. D’Agostino. The taming of the cut. Classical refutations with
    analytic cut. Journal of Logic and Computation, 4:285–319, 1994.
15. B. Motik and I. Horrocks. OWL datatypes: Design and implementation. In Proc.
    of the 7th Int. Semantic Web Conference (ISWC 2008), volume 5318 of LNCS,
    pages 307–322. Springer, October 26–30 2008.
16. D. F. Santamaria. A Set-Theoretical Representation for OWL 2 Profiles. LAP
    Lambert Academic Publishing, ISBN 978-3-659-68797-6, 2015.
17. Jacob T. Schwartz, Domenico Cantone, and Eugenio G. Omodeo. Computational
    Logic and Set Theory: Applying Formalized Logic to Analysis. Texts in Computer
    Science. Springer-Verlag New York, Inc., 2011.
18. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical
    OWL-DL reasoner. J. Web Sem., 5(2):51–53, 2007.
19. R. M. Smullyan. First-order Logic. Dover books on advanced Math. Dover, 1995.