A set-based reasoner for the description logic DL4,× D Domenico Cantone, Marianna Nicolosi-Asmundo, and Daniele Francesco Santamaria University of Catania, Dept. of Mathematics and Computer Science email: {cantone,nicolosi,santamaria}@dmi.unict.it Abstract. We present a KE-tableau-based implementation of a rea- soner for a decidable fragment of (stratified) set theory expressing the × description logic DLh4LQSR,× i(D) (DL4, D , for short). Our application × solves the main TBox and ABox reasoning problems for DL4, D . In par- × ticular, it solves the consistency problem for DL4, D -knowledge bases rep- resented in set-theoretic terms, and a generalization of the Conjunctive Query Answering problem in which conjunctive queries with variables of three sorts are admitted. The reasoner, which extends and optimizes × a previous prototype for the consistency checking of DL4, D -knowledge 4,× bases (see [7]), is implemented in C++. It supports DLD -knowledge bases serialized in the OWL/XML format, and it admits also rules ex- pressed in SWRL (Semantic Web Rule Language). 1 Introduction A wealth of decidability results has been collected over the years within the re- search field of Computable Set Theory [2, 11, 17]. However, only recently some of these results have been applied in the context of knowledge representation and reasoning for the semantic web. Such efforts have been motivated by the characteristics of the set-theoretic fragments considered, as they provide very expressive unique formalisms that combine the modelling capabilities of a rule language with the constructs of description logics. The decidable multi-sorted quantified set-theoretic fragment 4LQSR [3] is appropriate in this sense, in con- sideration of the fact that its decision procedure is efficiently implementable. We recall that the language of 4LQSR involves variables of four sorts, pair terms, and a restricted form of quantification. In [5], the theory 4LQSR has been used to represent the expressive description logic DL4,× D by means of a suitable translation mapping. Moreover, decidability of the most widespread reasoning problems for DL4,× D , such as the consistency prob- lem and the Conjunctive Query Answering (CQA) problem for DL4,× D -knowledge bases (KBs) were proved via a reduction to the satisfiability problem for 4LQSR . Since 4LQSR admits variables of four sorts, the CQA problem was generalized in such a way as to admit queries over three sorts of variables. Such a general- ization, called Higher-Order Conjunctive Query Answering (HOCQA) problem can be instantiated to the most widespread reasoning tasks for DL4,× D -ABox. The description logic DL4,× D admits Boolean operators on concepts and ab- stract roles, concept domain and range, and existential and minimum cardinality restriction on the left-hand side of inclusion axioms. It also supports role chains on the left-hand side of inclusion axioms and properties on roles such as transi- tivity, symmetry, and reflexivity. In [4], its consistency problem has been shown to be NP-complete under not very restrictive constraints. Such a low complexity result depends on the fact that existential quantification cannot appear on the right-hand side of inclusion axioms. Nonetheless, DL4,× D turns out to be more expressive than other low complexity logics such as OWL RL [16] and there- fore it is very suitable for representing real-world ontologies. For instance, the restricted version of DL4,× D mentioned above allows one to express several OWL ontologies, such as ArcheOntology [16] and OntoCeramic [10], for the classification of archaeological finds, and ArchivioMuseoFabbrica [1], concerning the renovation of the Monastery of San Nicola l’Arena in Catania by the architect Giancarlo De Carlo. Since existential quantification is admitted only on the left-hand side of inclusion axioms, DL4,× D is less expressive than logics such as SROIQ(D) [13] as long as the generation of new individuals is concerned. On the other hand, DL4,× D is more liberal than SROIQ(D) in the definition of role inclusion ax- ioms, as the roles involved in DL4,× D are not subject to any ordering relationship, and the notion of simple role is not needed. For example, the role hierarchy pre- sented in [13, page 2] is not expressible in SROIQ(D), but can be represented in DL4,× 4,× D . In addition, DLD is a powerful rule language able to express rules with negated atoms such as Person(?p) ∧ ¬hasHome(?p, ?h) =⇒ HomelessPerson(?p) that are not supported by the SWRL language. In [7], we presented a first effort to implement in C++ a KE-tableau-based decision procedure for the consistency problem of DL4,× D -KBs, by resorting to the algorithm introduced in [5]. The choice of KE-tableau systems [14], instead of traditional semantic tableaux [19], was motivated by the fact that KE-tableau systems introduce an analytic cut rule which permits to construct trees whose branches define mutually exclusive situations, thus avoiding the proliferation of redundant branches, typical of Smullyan’s semantic tableaux [19]. Thus, given as input a consistent KB, the procedure yields a KE-tableau whose open branches induce distinct models of the KB. Otherwise, a closed KE-tableau is returned. In this contribution we improve the reasoner presented in [7] by introducing a system called KEγ -tableau which admits a generalization of the KE-elimination rule incorporating the γ-rule, namely the expansion rule for handling universally quantified formulae. The reasoner also includes a procedure to compute the HOCQA problem for DL4,× D . Finally, through suitable benchmark tests, we show that such a novel reasoner is more efficient than the one introduced in [7]. 2 Preliminaries 2.1 The set-theoretic fragment We summarize the set-theoretic notions underpinning the description logic DL4,× D and its reasoning tasks. For the sake of conciseness, we avoid to report here the syntax and semantics of the whole 4LQSR theory (the interested reader can find it in [3] together with the decision procedure for its satisfiability problem). Thus, we focus on the 4LQSR -formulae de facto involved in the set-theoretic representation of DL4,× R D , namely propositional combinations of 4LQS -literals (atomic formulae R or their negations) and 4LQS purely universal formulae of the types displayed in Table 1. The class of such 4LQSR -formulae is called 4LQSRDL . 4,× D We recall that the fragment 4LQSR admits four collections, Vari , of variables of sort i denoted by X i , Y i , Z i , . . ., for i = 0, 1, 2, 3 (variables of sort 0 are also denoted by x, y, z, . . .). Besides variables, also pair terms of the form hx, yi, with x, y ∈ Var0 , are allowed. Since the types of formulae displayed in Table 1 do not contain variables of sort 2, here we limit ourselves only to notions and definitions relative to 4LQSRDL -formulae involving variables of sorts 0, 1, and 3. 4,× D Literals of level 0 Purely universal quantified formulae of level 1 1 3 (∀z1 ) . . . (∀zn )ϕ0 , where z1 , . . . , zn ∈ Var0 and ϕ0 is x = y, x ∈ X , hx, yi ∈ X any propositional combination of 1 3 ¬(x = y), ¬(x ∈ X ), ¬(hx, yi ∈ X ) literals of level 0. Table 1. Types of literals and quantified formulae admitted in 4LQSRDL4,× . D The variables z1 , . . . , zn are said to occur quantified in (∀z1 ) . . . (∀zn )ϕ0 . A variable occurs free in a 4LQSRDL -formula ϕ if it does not occur quantified in 4,× D any subformula of ϕ. For i = 0, 1, 3, we denote with Vari (ϕ) the collections of variables of sort i occurring free in ϕ. Given sequences of distinct variables x (in Var0 ), X 1 (in Var1 ), and X 3 (in Var3 ), of length n, m, and q, respectively, and sequences of (not necessarily distinct) variables y (in Var0 ), Y 1 (in Var1 ), and Y 3 (in Var3 ), also of length n, m, and q, respectively, the 4LQSRDL -substitution σ := {x/y, X 1 /Y 1 , X 3 /Y 3 } 4,× D is the mapping ϕ 7→ ϕσ such that, for any given universal quantified 4LQSRDL - 4,× D formula ϕ, ϕσ is the result of replacing in ϕ the free occurrences of the variables xi in x (for i = 1, . . . , n) with the corresponding yi in y, of Xj1 in X 1 (for j = 1, . . . , m) with Yj1 in Y 1 , and of Xh3 in X 3 (for h = 1, . . . , q) with Yh3 in Y 3 , respectively. A substitution σ is free for ϕ if the formulae ϕ and ϕσ have exactly the same occurrences of quantified variables. The empty substitution, denoted , satisfies ϕ = ϕ, for each 4LQSRDL -formula ϕ. 4,× D A 4LQSRDL -interpretation is a pair M = (D, M ), where D is a nonempty 4,× D collection of objects (called domain or universe of M) and M is an assignment over the variables in Vari , for i = 0, 1, 3, such that: M X 0 ∈ D, M X 1 ∈ P(D), M X 3 ∈ P(P(P(D))), where X i ∈ Vari , for i = 0, 1, 3, and P(s) denotes the powerset of s. Pair terms are interpreted à la Kuratowski, and therefore we put M hx, yi := {{M x}, {M x, M y}}. Next, let - M = (D, M ) be a 4LQSRDL -interpretation, 4,× D - x1 , . . . , xn ∈ Var0 , and - u1 , . . . , un ∈ D. By M[x/u], we denote the interpretation M0 = (D, M 0 ) such that M 0 xi = ui (for i = 1, . . . , n), and which otherwise coincides with M on all remaining vari- ables. For a 4LQSRDL -interpretation M = (D, M ) and a formula ϕ, the satisfi- 4,× D ability relationship M |= ϕ is recursively defined over the structure of ϕ as fol- lows. Literals are evaluated in a standard way, based on the usual interpretation of the predicates ‘∈’ and ‘=’, and of the propositional negation ‘¬’. Compound formulae are interpreted according to the standard rules of propositional logic. Finally, purely universal formulae are evaluated as follows: - M |= (∀z1 ) . . . (∀zn )ϕ0 iff M[z/u] |= ϕ0 , for all u ∈ Dn . If M |= ϕ, then M is said to be a 4LQSRDL -model for ϕ. A 4LQSRDL - 4,× D 4,× D formula is said to be satisfiable if it has a 4LQSRDL -model. A 4LQSRDL -formula 4,× D 4,× D is valid if it is satisfied by all 4LQSRDL -interpretations. 4,× D 2.2 The logic DLh4LQSR,× i(D) It is convenient to recall the main notions and definitions concerning the de- scription logic DLh4LQSR,× i(D) (also called DL4,× D ) [4]. Let RA , RD , C, and Ind be denumerable pairwise disjoint sets of abstract role names, concrete role names, concept names, and individual names, respec- tively. We assume that the set of abstract role names RA contains a name U denoting the universal role. Data types are introduced through the notion of data type maps, defined ac- cording to [15] as follows. A data type map is a quadruple D = (ND , NC , NF , ·D ), where ND is a finite set of data types, NC is a function assigning a set of con- stants NC (d) to each data type d ∈ ND , NF is a function assigning a set of facets NF (d) to each d ∈ ND , and ·D is (i) a function assigning a data type interpretation dD to each data type d ∈ ND , (ii) a facet interpretation f D ⊆ dD to each facet f ∈ NF (d), and (iii) a data value eD d ∈ d D to every constant ed ∈ NC (d). Facets determine subsets of data values considered of interest in a specific application domain. We shall assume that the interpretations of the data types in ND are nonempty pairwise disjoint sets. (a) DL4,× 4,× 4,× D -data types, (b) DLD -concepts, (c) DLD -abstract roles, and (d) DLD - 4,× concrete role terms are defined according to the DL standard notation (see [13]) as follows: (a) t1 , t2 −→ dr | ¬t1 | t1 u t2 | t1 t t2 | {ed } , (b) C1 , C2 −→ A | > | ⊥ | ¬C1 | C1 tC2 | C1 uC2 | {a} | ∃R.Self |∃R.{a}|∃P.{ed } , (c) R1 , R2 −→ S | U | R1− | ¬R1 | R1 tR2 | R1 uR2 | RC1 | | R|C1 | RC1 | C2 | id(C) | C1 × C2 , (d) P1 , P2 −→ T | ¬P1 | P1 t P2 | P1 u P2 | PC1 | | P|t1 | PC1 |t1 , where dr is a data range for D, t1 , t2 are data type terms, ed is a constant in NC (d), a is an individual name, A is a concept name, C1 , C2 are DL4,× D -concept 4,× terms, S is an abstract role name, R, R1 , R2 are DLD -abstract role terms, T is a concrete role name, and P, P1 , P2 are DL4,× D -concrete role terms. Notice that data type terms are intended to represent derived data types. A DL4,× 4,× D -KB is a triple K = (R, T , A) such that R is a DLD -RBox, T is a 4,× 4,× DLD -TBox, and A a DLD -ABox. A DL4,×D -RBox is a collection of statements of the following types: R1 ≡ R2 , R1 v R 2 , R1 . . . Rn v Rn+1 , Sym(R1 ), Asym(R1 ), Ref(R1 ), Irref(R1 ), Dis(R1 , R2 ), Tra(R1 ), Fun(R1 ), R1 ≡ C1 × C2 , P1 ≡ P2 , P 1 v P2 , Dis(P1 , P2 ), Fun(P1 ), where R1 , R2 are DL4,× 4,× D -abstract role terms, C1 , C2 are DLD -abstract concept 4,× terms, and P1 , P2 are DLD -concrete role terms. Any expression of the type R1 . . . Rn v R, where R1 , . . . , Rn , R are DL4,× D -abstract role terms, is called a role inclusion axiom (RIA). A DL4,×D -T Box is a set of statements of the types: - C1 ≡ C2 , C1 v C2 , C1 v ∀R.C2 , ∃R.C1 v C2 , ≥nR.C1 v C2 , C1 v ≤nR.C2 , - t1 ≡ t2 , t1 v t2 , C1 v ∀P.t1 , ∃P.t1 v C1 , ≥nP.t1 v C1 , C1 v ≤nP.t1 , where C1 , C2 are DL4,× 4,× D -concept terms, t1 , t2 data type terms, R a DLD -abstract 4,× role term, and P a DLD -concrete role term. Statements of the form C v D, where C and D are DL4,× D -concept terms, are general concept inclusion axioms. A DL4,× D -ABox is a set of individual assertions of the forms: a : C1 , (a, b) : R1 , a = b, a 6= b, ed : t1 , (a, ed ) : P1 , with C1 a DL4,×D -concept term, d a data type, t1 a data type term, R1 a DLD - 4,× 4,× abstract role term, P1 a DLD -concrete role term, a, b individual names, and ed a constant in NC (d). The semantics of DL4,× I I D is given via interpretations of the form I = (∆ , ∆D , · ), I D where ∆ and ∆D are nonempty disjoint domains such that d ⊆ ∆D , for every d ∈ ND , and ·I is an interpretation function. The interpretation of concepts and roles, axioms and assertions is defined in [8, Table 2]. Let R, T , and A be as above. An interpretation I = (∆I , ∆D , ·I ) is a D-model of R (resp., T ), and we write I |=D R (resp., I |=D T ), if I satisfies each axiom in R (resp., T ) according to the semantic rules in [8, Table 2]. Analogously, I = (∆I , ∆D , ·I ) is a D-model of A, and we write I |=D A, if I satisfies each assertion in A, according to [8, Table 2]. A DL4,× D -KB K = (A, T , R) is consistent if there exists a D-model I = (∆I , ∆D , ·I ) of A, T , and R. × The HOCQA problem for DL4, D . We recall that the problem of Higher- Order Conjuctive Query Answering (HOCQA) for DL4,× D , introduced in [5], is a generalization of the Conjunctive Query Answering problem for DL4,× D defined in [4]. The HOCQA problem for DL4,× D relies on the notion of Higher-Order (HO) DL4,× D -conjunctive query, admitting variables of three sorts: individual and data type variables, concept variables, and role variables. The HOCQA problem for DL4,× D consists in finding the HO answer set of an HO-DLD -conjunctive query 4,× 4,× (see below) with respect to a DLD -KB. Specifically, let Vi = {v1 , v2 , . . .}, Ve = {e1, e2, . . .}, Vd = {t1, t2, . . .}, Vc = {c1 , c2 , . . .}, Var = {r1 , r2 , . . .}, and Vcr = {p1 , p2 , . . .} S be pairwise disjoint de- numerably infinite sets of variables disjoint from Ind, {NC (d) : d ∈ ND }, C, RA , and RD . HO-DL4,× D -atomic formulae are expressions of the following types: R(w1 , w2 ), P (w1 , u), C(w1 ), t(u), r(w1 , w2 ), p(w1 , u), c(w1 ), t(u), w1 = w2 , where w1 , w2 ∈ Vi ∪Ind, u ∈ Ve ∪ {NC (d) : d ∈ ND }, R is a DL4,× S D -abstract role term, P is a DL4,× D -concrete role term, C is a DL 4,× D -concept term, t is a DL4,×D - 4,× data type term, r ∈ Var , p ∈ Vcr , c ∈ Vc , t ∈ Vd . A HO-DLD -atomic formula containing no variables is said to be ground. A HO-DL4,× D -literal is a HO-DLD - 4,× 4,× atomic formula or its negation. A HO-DLD -conjunctive query is a conjunction of HO-DL4,× D -literals. We denote with λ the empty HO-DLD -conjunctive query. 4,× Let v1 , . . . , vn ∈ Vi , e1 , . . . , eg ∈ Ve , t1 , . . . , tl ∈ Vd ,c1 , . .S. , cm ∈ Vc , r1 , . . . , rk ∈ Var , p1 , . . . , ph ∈ Vcr , o1 , . . . , on ∈ Ind, ed1 , . . . , edg ∈ {NC (d) : d ∈ ND }, C1 , . . . , Cm ∈ C, R1 , . . . , Rk ∈ RA , and P1 , . . . , Ph ∈ RD . A substitution σ := {v1 /o1 , . . . , vn /on , e1 /ed1 , . . . , eg /edg , t1 /t1 , . . . , tl /tl , c1 /C1 , . . . , cm /Cm , r1 /R1 , . . . , rk /Rk , p1 /P1 , . . . , ph /Ph } is a map such that, for every HO-DL4,× D - literal L, Lσ is obtained from L by replacing: (a) the occurrences of vi in L with oi , for i = 1, . . . , n; (b) the occurrences of eb in L with db , for b = 1, . . . , g; (c) the occurrences of ts in L with ts , for s = 1, . . . , l; (d) the occurrences of cj in L with Cj , for j = 1, . . . , m; (e) the occurrences of r` in L with R` , for ` = 1, . . . , k; (f) the occurrences of pt in L with Pt , for t = 1, . . . , h. Substitutions can be extended to HO-DL4,× D -conjunctive queries in the usual way. Let Q := (L1 ∧ . . . ∧ Lm ) be a HO-DL4,× D -conjunctive query, and KB a DLD - 4,× KB. A substitution σ involving exactly the variables occurring in Q is a solution for Q w.r.t. KB, if there exists a DL4,× D -interpretation I such that I |=D KB and I |=D Qσ. The collection Σ of the solutions for Q w.r.t. KB is the higher-order answer set of Q w.r.t. KB. Then the higher-order conjunctive query answering problem for Q w.r.t. KB consists in finding the HO answer set Σ of Q w.r.t. KB. The HOCQA problem for DL4,× D can be instantiated to the most significant ABox reasoning problems for DL4,× D (see [5]). × Representing DL4, D in set-theoretic terms. DL4,× D -KBs and HO-DLD - 4,× conjunctive queries can be represented in set-theoretic terms by exploiting a mapping θ defined in [5]. The function θ translates DL4,× R D statements in 4LQSDL - 4,× D formulae in CNF. Specifically, θ maps injectively individuals a, constants ed ∈ NC (d), variables w ∈ Vi , and variables u ∈ Ve into sort 0 variables xa , xed , xw , xu , the constant concepts > and ⊥, data type terms t, concept terms C, c ∈ Vc , 1 1 and t ∈ Vd into sort 1 variables X> , X⊥ , Xt1 , XC1 , Xc1 , Xt1 respectively, and the universal relation U , abstract role terms R, concrete role terms P , r ∈ Var , and p ∈ Vcr into sort 3 variables XU3 , XR 3 , XP3 , Xr3 , Xp3 , respectively.1 The mapping θ is defined for HO-DL4,× D -atomic formulae as follows: 3 θ(R(w1 , w2 )) := hxw1 , xw2 i ∈ XR , θ(P (w1 , u)) := hxw1 , xu i ∈ XP3 , θ(C(w1 )) := xw1 ∈ XC1 , θ(t(u)) := xu ∈ Xt1 , θ(w1 = w2 ) := xw1 = xw2 , θ(t(u)) := xu ∈ Xt1 , θ(c(w1 )) := xw1 ∈ Xc1 , θ(r(w1 , w2 )) := hxw1 , xw2 i ∈ Xr3 , θ(p(w1 , u)) := hxw1 , xu i ∈ Xp3 . Finally, θ is extended to HO-DL4,× D -conjunctive queries and to substitutions in a standard way. From now on we denote with φKB the 4LQSRDL translation of a DL4,× 4,× D D -KB R 4,× KB and with ψQ the 4LQSDL -formula representing the HO-DLD -conjunctive 4,× D query Q. The formula φKB is a conjunction of 4LQSRDL -formulae of type 4,× D (∀z1 ) . . . (∀zn )ϕ0 , with ϕ0 a clause of 4LQSRDL -literals, since (a) each DL4,× 4,× D D - R KB KB is a set of statements H such that θ(H) is a 4LQSDL -formula in Table 4,× D 1; and (b) φKB is constructed by conjoining the θ(H)s, moving universal quan- tifiers as inward as possible, and renaming quantified variables as to be pairwise distinct. The interested reader is referred to [6] for full details. Finally, the HOCQA problem for 4LQSRDL -formulae can be stated as follows. 4,× D Let ψ be a conjunction of 4LQSRDL -literals and φ a 4LQSRDL -formula. The 4,× D 4,× D HOCQA problem for ψ w.r.t. φ consists in computing the HO answer set of ψ w.r.t. φ, namely the collection Σ 0 of all the substitutions σ 0 such that M |= φ ∧ ψσ 0 , for some 4LQSRDL -interpretation M. 4,× D 3 Overview of the reasoner We present a general overview of the reasoner and the main notions and defini- tions concerning the procedures upon which it is based. The input of the reasoner is an OWL ontology serialized in the OWL/XML syntax and admitting SWRL rules (see Figure 1). Fig. 1. Execution cycle of the reasoner. 1 The use of level 3 variables to model abstract and concrete role terms is motivated by the fact that their elements, that is ordered pairs hx, yi, are encoded in Kuratowski’s style as {{x}, {x, y}}, namely as collections of sets of objects. If the ontology meets the DL4,× D requirements, a parser produces the internal coding of all axioms and assertions of the ontology in set-theoretic terms, as a list of strings. Then the system builds the data-structures required to execute the algorithm. In the two subsequent steps, the reasoner constructs a complete KEγ -tableau TKB whose open branches represent all possible models for the input KB φKB (see below for the definition of KEγ -tableau). The tableau TKB is constructed (1) by systematically applying the following two rules: (1a) a generalization of the KE-elimination rule incorporating the γ-rule, and (1b) the principle of bivalence rule (PB-rule) (thus constructing all branches of the KEγ - tableau—see Figure 2), and then (2) processing each open branch ϑ of TKB by constructing the equivalence classes of the individuals involved in formulae of type x = y occurring in ϑ and substituting each individual x on ϑ with the representative of the equivalence class of x. Such step returns the complete KEγ - tableau. Finally, the reasoner takes as input the internal coding of ψQ , i.e. the set-theoretic representation of a query Q, and computes the HO-answer set of ψQ with respect to φKB . The task of computing the complete KEγ -tableau for φKB is performed by procedure Consistency-DL4,× D illustrated in [8, Figure 8], whereas the task of computing the HOCQA answer set of a given query w.r.t the KB is performed by procedure HOCQAγ -DL4,× D shown in [8, Figure 9]. Let ΦKB := {φ : φ is a conjunct of φKB }. The procedure Consistency-DL4,×D constructs a complete KEγ -tableau TKB for the set ΦKB of the conjuncts of φKB representing the saturation of the DL4,× D -KB. At this point, it is convenient to give the definition of KEγ -tableaux. Let Φ := {C1 , . . . , Cp }, where each Ci is either a 4LQSRDL -literal of the types in Table 1 or 4,× D a 4LQSRDL -purely universal quantified formula of the form (∀x1 ) . . . (∀xm )(β1 ∨ 4,× D . . . ∨ βn ), with β1 . . . βn 4LQSRDL -literals. T is a KEγ -tableau for Φ if there 4,× D exists a finite sequence T1 , . . . , Tt such that (i) T1 is a one-branch tree consisting of the sequence C1 , . . . , Cp , (ii) Tt = T , and (iii) for each i < t, Ti+1 is obtained from Ti either by an application of one of the rules (Eγ -rule or PB-rule) in Figure 2 or by applying a substitution σ to a branch ϑ of Ti (in particular, the substitution σ is applied to each formula X of ϑ and the resulting branch will be denoted with ϑσ). In the definition of the Eγ -rule reported in Figure 2, (a) τ := {x1 /xo1 . . . xm /xom } is a substitution such that x1 , . . . , xm are the quantified variables in ψ and xo1 , . . . , xom ∈ Var0 (φKB ); and (b) S β i τ := {β 1 τ, . . . , β n τ } \ {β i τ } is a set containing the complements of all the disjuncts β1 . . . βn to which the substitution τ is applied, with the exception of the disjunct βi . Initially, the procedure Consistency-DL4,× D constructs a one-branch KE - γ tableau TKB for the set ΦKB of conjuncts of φKB . Then, it expands TKB by systematically applying the Eγ -rule and the PB-rule in Figure 2 to formulae of type ψ = (∀x1 ) . . . (∀xm )(β1 ∨ . . . ∨ βn ) till they are all fulfilled, giving priority to the Eγ -rule. Once such rules are no longer applicable, for each open branch ϑ of the resulting KEγ -tableau, atomic formulae of type x = y occurring in ϑ are used to compute the equivalence class of x and y. For each open branch ϑ of TKB , the equivalence class of each variable occurring in ϑ is obtained by computing the substitution σϑ such that ϑσϑ does not contain literals of type x = y, for distinct x, y. The resulting pair (ϑ, σϑ ) is added to the set E. ψ S βi τ γ E -rule PB-rule βi τ A | A where where A is a literal ψ := (∀x1 ) . . . (∀xm )(β1 ∨ . . . ∨ βn ), τ := {x1 /xo1 . . . xm /xom }, and S β i τ := {β 1 τ, ..., β n τ } \ {β i τ }, for i = 1, ..., n Fig. 2. Expansion rules for the KEγ -tableau. The procedure HOCQAγ -DL4,× D takes as input a query ψQ and the set E yielded by the procedure Consistency-DL4,× D and returns the answer set Σ of 0 ψQ w.r.t. φKB . For each open and complete branch ϑ of TKB , the procedure HOCQAγ -DL4,× D builds a decision tree Dϑ where each maximal branch induces a substitution σ 0 such that σϑ σ 0 belongs to the answer set of ψQ w.r.t. to φKB . Dϑ is obtained by constructing a stack of its nodes. Initially the stack contains just the root node (, ψQ σϑ ) of Dϑ , with  the empty substitution. At each step, the procedure pops out from the stack an element (σ 0 , ψQ σϑ σ 0 ) and iteratively selects a literal q from the query ψQ σϑ σ 0 and eliminates it from ψQ σϑ σ 0 . Then, the set of literals t in ϑ matching q is computed by putting Litϑq := {t ∈ ϑ : t = qρ, for some substitution ρ}. The successors of the current node are computed by pushing the node (σ 0 ρ, ψQ σϑ σ 0 ρ) in the stack, for each element in Litϑq . If the current node has the form of (σ 0 , λ), with λ the empty query, the last literal of ψQ has been treated and the substitution σϑ σ 0 is inserted in Σ 0 . Notice that, in case of a failing query match, the set Litϑq is empty and then no successor node is pushed into the stack. Thus, the failing branch of Dϑ is abandoned and another branch is selected by popping one of the nodes of Dϑ from the stack. Computational complexity results can be found in [9]. 3.1 Some implementation details We first show how the internal coding of DL4,× D -KBs is represented in terms of R 4LQSDL -formulae and the data-structures used by the reasoner for representing 4,× D formulae, nodes, and how KEγ -tableaux are implemented. Then we describe the most relevant functions that implement the procedures Consistency-DL4,× D and γ 4,× 4,× HOCQA -DLD and also illustrate an example of reasoning in DLD . To begin with, 4LQSRDL -variables, quantifiers, Boolean operators, set-theoretic 4,× D i relators, and pairs are mapped into strings as follows. Variables of type Xname are mapped into strings of the form Vi {name}. For the sake of uniformity, vari- ables of sort 0 are denoted with X 0 , Y 0 , . . ., whereas individuals a, concepts C, and roles R of a DL4,× D -KB are respectively mapped into the variables Xa , XC , 0 1 3 and XR , according to the function θ described in [5]. The symbols ∀, ∧, ∨, ¬∧, ¬∨ are mapped into the strings $FA, $AD, $OR, $DA, $RO, respectively. The re- lators ∈, 6∈, =, 6= are mapped into the strings $IN, $NI, $EQ, $QE, respectively. A pair hX10 , X20 i is mapped into the string $OA V01 $CO V02 $AO, where $OA represents the bracket “h”, $AO the bracket “i”, and $CO the comma symbol. Then, data-structures for representing the KB are built. 4LQSRDL -variables 4,× D are implemented by means of the class Var that has four fields. The field type of type integer indicates the sort of the variable, the field name of type string represents the name of the variable, and the field var of type integer represents a free variable if set to 0, and a quantified variable if set to 1. The field index stores the position of the variable in the vector VVL, delegated to collect free variables. Quantified and free variables are collected in the vectors VQL and VVL respectively, which provide a subvector for each sort of variable. The operators admitted in 4LQSRDL , internally coded as strings, are mapped 4,× D into three vectors that are fields of the class Operator. Specifically, the vector boolOp contains the values $OR, $AD, $RO, $DA, the vector setOp the values $IN, $EQ, $NI, $QE, $OA, $AO, $CO, and the vector qutOp the value $FA. 4LQSRDL -literals are stored using the class Lit that has two fields. The field 4,× D litOp of type integer represents the operator of the formula and corresponds to the index of one of the first four elements of the vector setOp. The field components is a vector whose elements point to the variables involved in the literal and stored in VQL and VVL. 4LQSRDL -formulae are represented by the class Formula, having a binary 4,× D tree structure, whose nodes contain objects of the class Lit. The left and right children contain the left and right subformula, respectively. The class Formula contains the following fields: the field lit of type pointer to Lit represents the lit- eral; the field operand represents the propositional operator, and its value is the index of the corresponding element of the vector boolOp; the field psubformula of type pointer to Formula is the pointer to the father node, whereas the fields lsubformula and rsubformula contain the pointers to the nodes representing the left and the right component of the formula, respectively. The procedure Consistency-DL4,× D is based on the data-structure implemented by the class Tableau. This class uses the instances of the class Node that repre- sents the nodes of the KEγ -tableau. The class Node has a tree-shaped structure and four fields: the field setFormula, of type vector of Formula, that collects the formulae of the current node, and three pointers to instances of the class Node. These are the leftchild, rightchild, and father fields, which point to the left child node, right child node, and father node, respectively. The root node of the class Tableau contains the field root of type pointer to Node. The fields openbranches and closedbranches collect the set of open branches and of closed branches, respectively. In addition, the class Tableau is provided with the field EqSet that is a three-dimensional vector of integers storing the equivalence classes induced by atomic formulae of type X 0 = Y 0 . In particular, EqSet stores a vector containing the indices of VVL corresponding to the variables belonging to the equivalence classes. As mentioned above, the reasoner takes as input an OWL ontology compat- ible with the DL4,×D requirements, also admitting SWRL rules, and serialized in the OWL/XML syntax. As first step, the function readOWLXMLontology pro- duces the internal coding of all axioms and assertions of the ontology, yielding a list of strings. Then the reasoner builds from the output of readOWLXMLontology the objects of type Formula that implement the 4LQSRDL -formulae represent- 4,× D ing the KB, and stores them in the field root of an object of type Tableau. In this phase, formulae are transformed in CNF and universal quantifiers are moved as inward as possible and renamed in such a way as to be pairwise dis- tinct. The object of type Tableau representing the KEγ -tableau is the input to the procedure expandGammaTableau that expands the KEγ -tableau by iter- atively selecting and fulfilling purely universal quantified input formulae. Once a purely universal quantified formula has been selected, expandGammaTableau builds iteratively the set of substitutions τ to be applied to the selected formula. A substitution τ is a map from the indices of the quantified variables of the formula, selected in order of appearance, to the elements of the vector VVL. The implementation of τ applies standard techniques for computing the variations with repetition of the set of indices of the elements of VVL taken to k by k, where k is the number of quantified variables occurring in the selected formula. The procedure expandGammaTableau fulfills the formula selected by system- atically applying the functions EGrule with the current τ and PBrule, respec- tively implementing the Eγ -rule and the PB-rule. More precisely, it works as follows. The disjuncts of the current formula to which τ is applied are stored in a temporary vector and selected iteratively. If a disjunct has its negation on the branch, it is removed from the temporary vector. Once all the elements of the temporary vector have been selected, if the last one does not have its negation on the branch, then EGrule is applied to the formula and the last element of the temporary vector is inserted in the branch according to Figure 2. If there is more than one element left in the temporary vector, then the procedure PBrule is applied. In case the stack is empty, a contradiction is found and the branch gets closed and inserted in the vector closedbranches. If the procedure expandGammaTableau terminates with some elements in openbranches, then the reasoner builds the set of equivalence classes of the vari- ables involved in formulae of type X 0 = Y 0 , for each element of openbranches by means of the procedure buildsEqSet. The latter procedure updates the field EqSet of the object of type Tableau with the new information concerning the set of equivalence classes. After the execution of buildsEqSet, if openbranches contains some elements, a consistent KB is returned. Procedure HOCQAγ -DL4,× D is implemented by the function performQuery that takes as input the object of type Tableau returned by buildsEqSet and a string representing the internal coding of the input query ψQ , and returns an object of type QueryManager storing, among other information, the answer set of ψQ w.r.t. φKB . The function performQuery uses an object of type QueryManager that stores the input query ψQ as a string, an object of type Formula representing ψQ , and the answer set of ψQ w.r.t. φKB , for each element of openbranches. The answer set is implemented by endowing the object of type QueryManager with the pair of vectors VarMatch. The first vector of VarMatch contains an integer for each element in openbranches: this is set to 1 if the corresponding branch has solution, 0 otherwise. The second (three-dimensional) vector contains for each element in openbranches a vector of solutions, each one constituted by a vector of pairs of pointers to Var. The first Var of such pair is a variable belonging to the query, whereas the second Var is the matched individual. For each element in openbranches, the function performQuery implements a decision tree by means of a stack that keeps track of the partial solutions of the query, as nodes of the decision tree. Such a stack, called matchSet, is constituted by a vector of pairs of objects of type Var such that the first one represents the query variable and the second one the matched element. Initially, matchSet is empty. At first step, the procedure selects the first conjunct of the query and, for each match found, it pushes in matchSet a vector of pairs representing the match. The procedure selects iteratively the conjuncts of the query and then applies to the selected conjunct the substitution that is currently at the top of matchSet. If the literal obtained by the application of such partial solution has one or more matches in the branch, the resulting substitutions are pushed in matchSet. Once all the literals of the query have been processed, if matchSet is not empty, it contains the leaves of the maximal branches of the decision tree, which are all added to VarMatch. × 3.2 Example of reasoning in DL4, D Let us consider the ontology displayed in Figure 3. Fig. 3. OWL ontology of the example. The KB in terms of 4LQSRDL 4,× D is the following formula: 3 φKB := ¬(hxEva , xAnn i ∈ XM other ) ∧ 3 3 hxAnn , xAnn i ∈ XRelative ∧ hxEva , xEva i ∈ XRelative ∧ 3 3 (∀z1 )(∀z2 )(¬(hz1 , z2 i ∈ XM other ) ∨ hz1 , z2 i ∈ XRelative ) 3 Let ψQ = hz, xEva i ∈ XM other be a query represented in set-theoretic terms. A complete KEγ -tableau for φKB and the decision trees constructed for the evaluation of ψQ on each open branch of the KEγ -tableau are shown in Figure 4. Notice that, the decision tree constructed on the leftmost open branch of the KEγ -tableau provides no solution. Fig. 4. KEγ -tableau for φKB and decision trees for the evaluation of ψQ . The internal representation of the OWL ontology is shown in Figure 5. The KEγ -tableau computed by the reasoner and the evaluation of the query ψQ are reported in Figure 6. Finally, Figure 7 shows a performance comparison between our implementation of the KE-tableau presented in [7] and the KEγ -tableau system for DL4,× D presented in this paper. The metric used in the benchmarking is the number of models of the input KB computed by the reasoners and the time required to compute such models. As shown in Figure 7, the KEγ -tableau has a better performance than the KE-tableau up to about 400%, even if in some cases the performances of the two systems are comparable, as shown in the plot. We conclude that the KEγ -tableau system is always convenient, also because the expansions of quantified formulae are not stored in memory. Fig. 5. Internal representation of the ontology. Fig. 6. The KEγ -tableau and the evaluation of ψQ computed by the reasoner. Fig. 7. Comparison between KE-tableau and KEγ -tableau systems. 4 Conclusions We presented a C++ implementation of a KEγ -tableau system for the most widespread reasoning tasks of DL4,× 4,× D , such as consistency checking of DLD -KBs 4,× and a generalization of the CQA problem for DLD , , called HOCQA problem, admitting conjunctive queries with variables of three sorts. These problems have been addressed by translating DL4,× 4,× D -KBs and higher-order DLD -conjunctive queries in terms of formulae of the set-theoretic fragment 4LQSRDL . The rea- 4,× D soner is an improvement of the KE-tableau system introduced in [7] to check consistency of DL4,× D -KBs, as it admits a generalization of the KE-elimination rule incorporating the γ-rule. The reasoner takes as input OWL ontologies com- patible with the specifications of DL4,× D serialized in the OWL/XML format and admitting SWRL rules. Finally, we showed that the reasoner presented in this paper is more efficient than the one introduced in [7], by means of suitable benchmark test sets. We plan to extend the set-theoretic fragment underpinning the reasoner to include also a restricted version of the operator of relational composition. This will allow ones to reason with description logics that admit full existential and universal quantification. In addition, we intend to improve our reasoner so as to deal with the reasoning problem of ontology classification. Then, we shall compare the resulting reasoner with existing well-known reasoners such as Her- miT [12] and Pellet [18], providing some benchmarking. We also plan to allow data type reasoning by either integrating existing solvers for the Satisfiability Modulo Theories (SMT) problem or by designing ad-hoc new solvers. Finally, as each branch of the KEγ -tableau can be computed by a single processing unit, we plan to implement a parallel version of the software by using the Nvidia CUDA library. References 1. C. Cantale, D. Cantone, M. Nicolosi-Asmundo, and D.F. Santamaria. Dis- tant Reading Through Ontologies: The Case Study of Catania’s Benedictines Monastery. JIS.it, 8,3 (September 2017). 2. D. Cantone, A. Ferro, and E. G. Omodeo. Computable set theory. Number 6 in International Series of Monographs on Computer Science, Oxford Science Publi- cations. Clarendon Press, Oxford, UK, 1989. 3. D. Cantone and M. Nicolosi-Asmundo. On the satisfiability problem for a 4-level quantified syllogistic and some applications to modal logic. Fundamenta Informat- icae, 124(4):427–448, 2013. 4. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. Conjunctive query answering via a fragment of set theory. In Proc. of ICTCS 2016, Lecce, September 7-9, CEUR-WS Vol. 1720, pp. 23–35. 5. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-theoretic approach to ABox reasoning services. In Costantini S., Franconi E., Van Woensel W., Kontchakov R., Sadri F., Roman D. Rules and Reasoning. RuleML+RR 2017., Lecture Notes in Computer Science, vol 10364. Springer, 2017. 6. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-theoretic approach to ABox reasoning services. CoRR, 1702.03096, 2017. Extended version. 7. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A C++ reasoner for × the description logic DL4,D . In Proc.of CILC 2017, 26-29 September 2017, Naples, Italy. CEUR WS, ISSN 1613-0073, Vol. 1949, pp. 276-280., 2017. 8. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. A set-based reasoner × for the description logic DL4,D . CoRR, 1805.08606, 2018. Extended version. 9. D. Cantone, M. Nicolosi-Asmundo, and D. F. Santamaria. An optimized KE- × tableau-based system for reasoning in the description logic DL4, D . CoRR, 1804.11222, 2018. Extended version. 10. D. Cantone, M. Nicolosi-Asmundo, D. F. Santamaria, and F. Trapani. Onto- ceramic: an OWL ontology for ceramics classification. In Proc. of CILC 2015, CEUR-WS, vol. 1459, pp. 122–127, Genova, July 1-3, 2015. 11. D. Cantone, E. Omodeo, and A. Policriti. Set theory for computing: from decision procedures to declarative programming with sets. Monographs in Computer Science. Springer-Verlag, New York, NY, USA, 2001. 12. B. Glimm, I. Horrocks, B. Motik, G. Stoilos, and Z. Wang. HermiT: An OWL 2 Reasoner. Journal of Automated Reasoning, 53(3):245–269, 2014. 13. I. Horrocks, O. Kutz, and U. Sattler. The even more irresistible SROIQ. In Proc. 10th Int. Conf. on Princ. of Knowledge Representation and Reasoning, (Doherty, P. and Mylopoulos, J. and Welty, C. A., eds.), pages 57–67. AAAI Press, 2006. 14. M. Mondadori M. D’Agostino. The taming of the cut. Classical refutations with analytic cut. Journal of Logic and Computation, 4:285–319, 1994. 15. B. Motik and I. Horrocks. OWL datatypes: Design and implementation. In Proc. of the 7th Int. Semantic Web Conference (ISWC 2008), volume 5318 of LNCS, pages 307–322. Springer, October 26–30 2008. 16. D. F. Santamaria. A Set-Theoretical Representation for OWL 2 Profiles. LAP Lambert Academic Publishing, ISBN 978-3-659-68797-6, 2015. 17. Jacob T. Schwartz, Domenico Cantone, and Eugenio G. Omodeo. Computational Logic and Set Theory: Applying Formalized Logic to Analysis. Texts in Computer Science. Springer-Verlag New York, Inc., 2011. 18. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. J. Web Sem., 5(2):51–53, 2007. 19. R. M. Smullyan. First-order Logic. Dover books on advanced Math. Dover, 1995.