A graph-based approach for classifying OWL 2 QL ontologies? Domenico Lembo, Valerio Santarelli, and Domenico Fabio Savo Dipartimento di Ing. Informatica, Automatica e Gestionale “Antonio Ruberti” Sapienza Università di Roma Via Ariosto 25, I-00186 Roma, Italy {lembo,santarelli,savo}@dis.uniroma1.it Abstract. Ontology classification is the reasoning service that com- putes all subsumption relationships inferred in an ontology between con- cept, role, and attribute names in the ontology signature. OWL 2 QL is a tractable profile of OWL 2 for which ontology classification is polynomial in the size of the ontology TBox. However, to date, no efficient methods and implementations specifically tailored to OWL 2 QL ontologies have been developed. In this paper, we provide a new algorithm for ontol- ogy classification in OWL 2 QL, which is based on the idea of encoding the ontology TBox into a directed graph and reducing core reasoning to computation of the transitive closure of the graph. We have implemented the algorithm in the QuOnto reasoner and extensively evaluated it over very large ontologies. Our experiments show that QuOnto outperforms various popular reasoners in classification of OWL 2 QL ontologies. 1 Introduction Ontology classification is the problem of computing all subsumption relationships inferred in an ontology between predicate names in the ontology signature, i.e., named concepts (a.k.a. classes), roles (a.k.a. object-properties), and attributes (a.k.a. data-properties). It is considered a core service for ontology reasoning, which can be exploited for various tasks, at both design-time and run-time, ranging from ontology navigation and visualization to query answering. Devising efficient ontology classification methods and implementations is a challenging issue, since classification is in general a costly operation. Most pop- ular reasoners for Description Logic (DL) ontologies, i.e., OWL ontologies, such as Pellet [23], Racer [11], FACT++ [24], and HermiT [9], offer highly optimized classification services for expressive DLs. Various experimental studies show that such reasoners have reached very good performances through the years. How- ever, they are still not able to efficiently classify very large ontologies, such as the full versions of GALEN [22] or of the FMA ontology [10]. Whereas the above tools use algorithms based on model construction through tableau (or hyper-tableau [9]), the CB reasoner [14] for the Horn-SHIQ DL is ? This paper is an extended abstract of [18]. a consequence-driven reasoner. The use of this technique allows CB to obtain an impressive gain on very large ontologies, such as full GALEN. However, the current implementation of the CB reasoner is rather specific for particular frag- ments of Horn-SHIQ (and incomplete for the general case) [14]. For example, it does not allow for classification of properties. Other recently developed tools, such as Snorocket [17], ELK [15], and JCEL [20], are specifically tailored to intensional reasoning over logics of the EL family, and show excellent performances in classification of ontologies speci- fied in such languages, which are the logical underpinning of OWL 2 EL, one of the tractable profile of OWL 2 [21]. Instead, to the best of our knowledge, ontology classification in the other OWL 2 profiles has received so far little attention. In particular, classification in OWL 2 RL has been investigated only in [16], whereas, to date, no techniques have been developed that are specifically tailored to intensional reasoning in OWL 2 QL, the “data oriented” profile of OWL 2, nor for any logic of the DL-Lite family [7]1 , which constitutes the logical underpinning of OWL 2 QL. Our aim is then to contribute to fill this lack on OWL 2 QL, encouraged also by the fact that such language, like all logics of the DL-Lite family, allows for tractable intensional reasoning, and in particular for PTime ontology classification, as it immediately follows from the results in [7]. In this paper, we thus provide a new method for ontology classification in the OWL 2 QL profile. In our technique, we encode the ontology terminology (TBox) into a graph, and compute the transitive closure of the graph to then obtain the ontology classification. The analogy between simple inference rules in DLs and graph reachability is indeed very natural: consider, for example, an ontology containing the subsumptions A1 v A2 and A2 v A3 , where A1 , A2 , and A3 are class names in the ontology signature. We can then associate to this ontology a graph having three nodes labeled with A1 , A2 , and A3 , respectively, an edge from A1 to A2 and an edge from A2 to A3 . It is straightforward to see that A3 is reachable from A1 , and therefore an edge from A1 to A3 is contained in the transitive closure of the graph. This corresponds to the inferred subsumption A1 v A3 . On the other hand, things become soon much more complicated when complex (OWL) axioms come into play. In this respect, we will show that for an OWL 2 QL ontology it is possible to easily construct a graph whose closure constitutes the major sub-task in on- tology classification, because it allows us to obtain all subsumptions inferred by the “positive knowledge” specified by the TBox. We will show that the com- puted classification misses only “trivial” subsumptions inferred by unsatisfiable predicates, i.e., named classes (resp. properties) that always have an empty in- terpretation in every model of the ontology, and that are therefore subsumed by every class (resp. property) in the ontology signature. We therefore provide an algorithm that, exploiting the transitive closure of the graph, computes all unsatisfiable predicates, thus allowing us to obtain a complete ontology classi- 1 Not to be confused with the set of DLs studied in [2], which form the DL-Litebool family. fication. We notice that the presence of unsatisfiable predicates in an ontology is mainly due to errors in the design. However, it is not rare to find such pred- icates, especially in very large ontologies or in ontologies that are still “under construction”. In particular, we could find unsatisfiable concepts even in some benchmark ontologies we used in our experiments (cf. Section 4). Of course, al- ready debugged ontologies might not present such predicates [13,12]. In this case, one can avoid executing our algorithm for computing unsatisfiable predicates. We have implemented our technique in a new module of QuOnto [1], the reasoner at the base of the Mastro [6,8] system, and have carried out extensive experimentation, focusing in particular on very large ontologies. We have consid- ered a number of well-known ontologies, often used as benchmark for ontology classification, and have suitably approximated in OWL 2 QL those that are out of this language. QuOnto showed better performances, in some cases corresponding to enor- mous gains, with respect to tableau-based reasoners (in particular, Pellet, Fact++, and HermiT). We also obtained comparable or better results with re- spect to the CB reasoner, for almost all ontologies considered, but, differently from CB reasoner, we were always able to compute a complete classification. We finally compared QuOnto with ELK, one of the most performing reasoner for EL, for those approximated ontologies that turned out to be both in OWL 2 QL and OWL 2 EL, obtaining similar performances in almost all cases. We conclude by noticing that, even though we refer here to OWL 2 QL, our algorithms and implementations can be easily adapted to deal with all logics of the DL-Lite family mentioned in [7], excluding those allowing for the use of conjunction in the left-hand side of inclusion assertions or the use of n-ary relations instead of binary roles. The rest of the paper is organized as follows. In Section 2, we provide some preliminaries. In Section 3, we describe our technique for ontology classification in OWL 2 QL. In Section 4, we describe our experimentation, and finally, in Section 5, we conclude the paper. 2 Preliminaries In this section, we present some basic notions on DL ontologies, the formal underpinning of the OWL 2 language, and on OWL 2 QL. We also recall some notions of graph theory needed later on. Description Logic Ontologies. We consider a signature Σ, partitioned in two disjoint signatures, namely, ΣP , containing symbols for predicates, i.e., atomic concepts, atomic roles, atomic attributes, and value-domains, and ΣC , containing symbols for individual (object and value) constants. Complex concept, role, and attribute expressions are constructed starting from predicates of ΣP by applying suitable constructs, which vary in different DL languages. Given a DL language L, an L-TBox (or simply a TBox, when L is clear) over Σ contains universally quantified first-order (FOL) assertions, i.e., axioms specifying general properties of concepts, roles, and attributes. Again, different DLs allow for different axioms. An L-ABox (or simply an ABox, when L is clear) is a set of assertions on individual constants, which specify extensional knowledge. An L-ontology O is constituted by both an L-TBox T and an L-ABox A, denoted as O = hT , Ai. The semantics of a DL ontology O is given in terms of FOL interpretations (cf. [4]). We denote with Mod (O) the set of models of O, i.e., the set of FOL- interpretations that satisfy all TBox axioms and ABox assertions in O, where the definition of satisfaction depends on the DL language in which O is specified. An ontology O is satisfiable if Mod (O) 6= ∅. A FOL-sentence φ is entailed by an ontology O, denoted O |= φ, if φ is satisfied by every model in Mod (O). All the above notions naturally apply to a TBox T alone. Traditional intensional reasoning tasks with respect to a given TBox are verification of subsumption and satisfiability of concepts, roles, and attributes [4]. More precisely, a concept C1 is subsumed in T by a concept C2 , written T |= C1 v C2 , if, in every model I of T , the interpretation of C1 , denoted C1I , is contained in the interpretation of C2 , denoted C2I , i.e., C1I ⊆ C2I for every I ∈ Mod (T ). Furthermore, a concept C in T is unsatisfiable, which we wrote as T |= C v ¬C, if the interpretation of C is empty in every model of T , i.e., C I = ∅ for every I ∈ Mod (T ). Analogous definitions hold for roles and attributes. Strictly related to the previous reasoning tasks is the classification inference service, which we focus on in this paper. Given a signature ΣP and a TBox T over ΣP , such a service allows to determine subsumption relationships in T between concepts, roles, and attributes in ΣP . Therefore, classification allows to structure the terminology of T in the form of a subsumption hierarchy that provides useful information on the connection between different terms, and can be used to speed up other inference services. Here we define it more formally. Definition 1. Let T be a satisfiable L-TBox over ΣP . We define the T - classification of ΣP (or simply T -classification when ΣP is clear from the con- text) as the set of inclusion assertions defined as follows: Let S1 and S2 be either two concepts, roles, or attributes in ΣP . If T |= S1 v S2 then S1 v S2 belongs to the T -classification of ΣP . The OWL 2 QL Language. The OWL 2 QL language is based on DL-LiteR , a DL of the DL-Lite family [7]. Differently from DL-LiteR , however, besides ob- ject properties (i.e., roles), OWL 2 QL allows also for the use of data properties (i.e., attributes), as well as some further constructs, as (ir-)reflexivity on prop- erties. For the sake of presentation, we prefer to not consider here attributes, nor (ir-)reflexivity constraints. This choice does not actually correspond to a real simplification, since in the algorithms proposed in this paper we can treat both attributes and roles essentially in the same way, and our techniques can be applied to full OWL 2 QL ontologies with minimal adaptations. Therefore, in the following, we provide a simplified, German style, syntax for OWL 2 QL, which actually corresponds to that of DL-LiteR , whereas refer the reader to [21] for the complete, OWL functional-style syntax of this language2 . 2 Notice that (a)symmetric roles allowed in OWL 2 QL, even though not explicitly mentioned, can be easily expressed in the syntax that we consider. Expressions in OWL 2 QL are formed according to the following syntax: B −→ A | ∃Q Q −→ P | P − C −→ B | ¬B | ∃Q.A R −→ Q | ¬Q where: A and P are symbols in ΣP denoting respectively an atomic concept and an atomic role; P − denotes the inverse of P ; ∃Q, also called unqualified existential role, denotes the set of objects related to some object by the role Q; the concept ∃Q.A, or qualified existential role, denotes the qualified domain of Q with respect to A, i.e., the set of objects that Q relates to some instance of A. In the following, we call B a basic concept, and Q a basic role. An OWL 2 QL TBox T is a finite set of axioms of the form B v C and Q v R, where the former denote subsumptions between concepts, and the latter subsumptions between roles. We call positive inclusions axioms of the form B1 v B2 , B1 v ∃Q.A, and Q1 v Q2 , and negative inclusions axioms of the form B1 v ¬B2 and Q1 v ¬Q2 . The semantics of OWL 2 QL ontologies and TBoxes is given in the standard way [21,4]. As for OWL 2 QL ABoxes, we do not present them here, since we concentrate on intensional reasoning, and refer the interested reader to [21]. Graph Theory Notions. In this paper we use the term digraph to refer to a directed graph. We assume that a digraph G is a pair (N , E), where N is a set of elements called nodes, and E is a set of ordered pairs (s, t) of nodes in N , called arcs, where s is denoted the source of the arc, and t the target of the arc. The transitive closure G ∗ = (N , E ∗ ) of a digraph G = (N , E) is a digraph such that there is an arc in E ∗ having a node s as source and a node t as target if and only if there is a path from s to t in G [5]. Let G = (N , E) be a digraph, and let n be a node in N . We denote with predecessors(n, G) the set of nodes pn in N such that there exists in E an arc (pn , n). 3 T -classification in OWL 2 QL In this section we describe our approach to computing, given a signature ΣP and an OWL 2 QL TBox T over ΣP , the T -classification of ΣP . In OWL 2 QL, a subsumption relation between two concepts or roles in ΣP , can be inferred by a TBox T if and only if (i) T contains such subsumption; (ii) T contains a set of positive inclusion assertions that together entail the subsumption; or (iii), trivially, the subsumed concept or role is unsatisfiable in T . The above observation is formalized as follows. Theorem 1. Let T be an OWL 2 QL TBox containing only positive inclusions, and let S1 and S2 be two atomic concepts or two atomic roles. S1 v S2 is entailed by T if and only if at least one of the following conditions holds: 1. a set P of positive inclusions exists in T , such that P |= S1 v S2 ; 2. T |= S1 v ¬S1 . Given a OWL 2 QL TBox T over a signature ΣP , we use ΦT and ΩT to denote two sets of positive inclusions of the form S1 v S2 , with S1 , S2 ∈ ΣP , such that ΦT contains only positive inclusions for which statement 1 holds, and ΩT contains only positive inclusions for which statement 2 holds. It is easy to see that ΦT and ΩT are not disjoint. From Definition 1 and Theorem 1 it follows that the T -classification coincides with the union of the sets ΦT and ΩT . In the following, we describe our approach to the computation of the T - classification by firstly computing the set ΦT , and then computing the set ΩT . Computation of ΦT . Given an OWL 2 QL TBox T , in order to compute ΦT , we encode the set of positive inclusions in T into a digraph GT and compute the transitive closure of GT in such a way that each subsumption S1 v S2 in ΦT corresponds to an arc (S1 , S2 ) in such transitive closure, and vice versa. The following constructive definition describes the appropriate fashion to obtain the digraph TBox representation for our aims. Definition 2. Let T be an OWL 2 QL TBox over a signature ΣP . We call the digraph representation of T the digraph GT = (N , E) built as follows: 1. for each atomic concept A in ΣP , N contains the node A; 2. for each atomic role P in ΣP , N contains the nodes P , P − , ∃P , ∃P − ; 3. for each concept inclusion B1 v B2 ∈ T , E contains the arc (B1 , B2 ); 4. for each role inclusion Q1 v Q2 ∈ T , E contains the arcs (Q1 , Q2 ), (Q− − − − 1 , Q2 ), (∃Q1 ,∃Q2 ), and (∃Q1 , ∃Q2 ); 5. for each concept inclusion B1 v ∃Q.A ∈ T , E contains the arc (B1 , ∃Q); The idea is that each node in the digraph GT represents a basic concept or a basic role, and each arc models a positive inclusion, i.e., a subsumption, contained in T , where the source node of the arc represents the left-hand side of the subsumption and the target node of the arc represents the right-hand side of the subsumption. Observe that for each role inclusion assertion P1 v P2 in the TBox T , we also represent as nodes and arcs in the digraph GT the entailed positive inclusions P1− v P2− , ∃P1 v ∃P2 , and ∃P1− v ∃P2− . Let T be an OWL 2 QL TBox and let GT = (N , E) be its digraph represen- tation. We denote with GT∗ = (N , E ∗ ) the transitive closure of GT . Note that by definition of digraph transitive closure, for each node n ∈ N there exists in E ∗ an arc (n, n). Moreover, in what follows, we denote with α(E ∗ ) the set of arcs (S1 , S2 ) ∈ E ∗ such that both terms S1 and S2 denote in T either two atomic concepts or two atomic roles. Then, the following property holds. Theorem 2. Let T be an OWL 2 QL TBox and let GT = (N , E) be its digraph representation. Let S1 and S2 be two atomic concepts or two atomic roles. An inclusion assertion S1 v S2 belongs to ΦT if and only if there exists in α(E ∗ ) an arc (S1 , S2 ). We can then easily construct an algorithm, called ComputeΦ, that, taken as input an OWL 2 QL TBox T , first builds the digraph GT = (N , E) according Algorithm: computeUnsat Input: an OWL 2 QL TBox T Output: a set of concept and role expressions Emp ← ∅; foreach negative inclusion S1 v ¬S2 ∈ T do /* step 1 */ foreach n1 ∈ predecessors(S1 , GT∗ ) do foreach n2 ∈ predecessors(S2 , GT∗ ) do if n1 = n2 then Emp ← Emp ∪ {n1 }; if (n1 = ∃Q− and n2 = A) or (n2 = ∃Q− and n1 = A) then Emp ← Emp ∪ {∃Q.A}; Emp0 ← ∅; while Emp 6= Emp0 do /* step 2 */ Emp0 ← Emp; foreach S ∈ Emp0 do foreach n ∈ predecessors(S, GT∗ ) do Emp ← Emp ∪ {n}; if n = P or n = P − or n = ∃P or n = ∃P − then Emp ← Emp ∪ {P, P − , ∃P, ∃P − }; if there exists B v ∃Q.n ∈ T then Emp ← Emp ∪ {∃Q.n}; return Emp. Fig. 1: The algorithm computeUnsat(T ) to Definition 2, then computes its transitive closure, and finally returns the set ΦT , which contains an inclusion assertion S1 v S2 for each arc (S1 , S2 ) ∈ α(E ∗ ). According to Theorem 2, ComputeΦ is sound and complete with respect to the problem of computing ΦT for any OWL 2 QL TBox T containing only positive inclusions. Computation of ΩT . We first observe that, according Definition 2, no node corresponding to a qualified existential role is created in the TBox digraph rep- resentation. This kind of node is indeed not useful for computing ΦT . Differently, if one aims to identify every cause of unsatisfiability, the creation of nodes cor- responding to a qualified existential role is needed. This is due to the fact that a TBox may entail that a qualified existential role ∃P.A is unsatisfiable, even in case of satisfiability of ∃P . Specifically, this may occur in two instances: (i) if the TBox T entails the assertion ∃P − v ¬A, and (ii), the TBox T entails A v ¬A. Clearly, in both cases the concept ∃P.A is unsatisfiable. We therefore modify here Definition 2 by substituting Rule 5 with the following one: 5∗ . for each concept inclusion B1 v ∃Q.A ∈ T , N contains the node ∃Q.A, and E contains the arcs (B1 , ∃Q.A) and (∃Q.A, ∃Q); From now on, we adopt the digraph representation built according to Defini- tion 2, where rule 5∗ replaces rule 5. Given one such TBox T over a signature ΣP , the algorithm computeUnsat given in Figure 1 returns all unsatisfiable concepts and roles in ΣP , by exploiting the transitive closure of the digraph representation of T . Before describing the algorithm, we recall that, given a digraph G = (N , E) and a node n ∈ N , the set predecessors(n, G ∗ ) contains all those nodes n0 in N such that G ∗ contains the arc (n0 , n), which means that there exists a path from n0 to n in G. Also, it can be shown that GT∗ allows in fact to obtain all subsumptions between satisfiable basic concepts or roles, in the sense that the TBox T infers one such subsumption S1 v S2 if and only if there is an arc (S1 , S2 ) in E ∗ . Then, the two steps that compose the algorithm proceed as follows: Step 1 Let S be either a concept expression or a role expression. We have that for each S i ∈ predecessors(S, GT∗ ) the TBox T entails S i v S. Hence, given a negative inclusion assertion S1 v ¬S2 , for each S1i ∈ predecessors(S1 , GT∗ ) and for each S2j ∈ predecessors(S2 , GT∗ ), T |= S1i v ¬S2j . Therefore, for each negative inclusion S1 v ¬S2 ∈ T , the algo- rithm computes the set predecessors(S1 , GT∗ ) and predecessors(S2 , GT∗ ) and is able to: (i) recognize as unsatisfiable all those concepts and roles whose corresponding nodes occur in both the set predecessors(S1 , GT∗ ) and predecessors(S2 , GT∗ ), and (ii) identify those unsatisfiable qualified exis- tential roles ∃Q.A whose inverse existential role node ∃Q− occurs in predecessors(S1 , GT∗ ) (resp. predecessors(S2 , GT∗ )) and whose concept node A occurs in predecessors(S2 , GT∗ ) (resp. predecessors(S1 , GT∗ )), which indeed im- plies ∃Q− v ¬A and therefore unsatisfiability of ∃Q.A. Step 2 Further unsatisfiable concepts and roles are identified by the algorithm through a cycle in which: (i) if a concept or role S is in Emp, then all the ex- pressions corresponding to the nodes in predecessors(S, GT∗ ) are in Emp. This captures propagation of unsatisfiability through chains of positive inclusions; (ii) if at least one of the expressions P, P − , ∃P, ∃P − is in Emp, then all four expressions are in Emp; (iii) for each expression ∃Q.A in N , if A ∈ Emp, then ∃Q.A ∈ Emp. We notice that the algorithm stops cycling when no new expressions of the form ∃Q or ∃Q.A are added to Emp (indeed, in this case only a single further iteration may be needed). It easy to see that, by virtue of the fact that the size of the set N of the digraph representation of the TBox T is finite, computeUnsat(T ) terminates, and that the number of executions of the while cycle is less than or equal to |N |. The following theorem shows that algorithm computeUnsat can be used for computing the set containing all the unsatisfiable concepts and roles in T . Theorem 3. Let T be an OWL 2 QL TBox and let S be either an atomic con- cept or an atomic role in ΣP . T |= S v ¬S if and only if S ∈ computeUnsat(T ). We call ComputeΩ the algorithm that, taken T as input, returns ΩT by making use of computeUnsat. The following theorem, which is a direct consequence of Theorem 2 and of Theorem 3, states that our technique is sound and complete with respect to the problem of classifying an OWL 2 QL TBox. Theorem 4. Let T be an OWL 2 QL TBox and let S1 and S2 be either two atomic predicates. T |= S1 v S2 if and only if S1 v S2 ∈ ComputeΦ(T ) ∪ ComputeΩ(T ). Original DL Original Owl 2 QL Negative Ontology Concepts Roles Attributes fragment axioms axioms inclusions Mouse 2753 1 0 ALE 3463 3463 0 Transportation 445 89 4 ALCH(D) 931 931 317 DOLCE 209 313 4 SHOIN(D) 1736 1991 45 AEO 760 47 16 SHIN(D) 3449 3432 1957 Gene 26225 4 0 SH 42655 42655 3 EL-Galen 23136 950 0 ELH 46457 48026 0 Galen 23141 950 0 ALEHIF+ 47407 49926 0 FMA 1.4 6488 165 0 ALCOIF 18612 18663 0 FMA 2.0 41648 148 20 ALCOIF(D) 123610 118181 0 FMA 3.2.1 84454 132 67 ALCOIF(D) 88204 84987 0 FMA-OBO 75139 2 0 ALE 119558 119558 0 Table 1: In the table the Original and OWL 2 QL axioms fields indicate respec- tively the total number of axioms in the original version of the ontology and in the OWL 2 QL-approximated version. The Negative inclusion field reports the number of negative inclusions in the OWL 2 QL-approximated version. 4 Implementation and Evaluation By exploiting the results presented in Section 3, we have developed a Java-based OWL 2 QL classification module for the QuOnto reasoner [1,6,8]. This module computes the classification of an OWL 2 QL TBox T by adopt- ing the technique described in Section 3. In this implementation the transitive closure of the digraph GT is based on a breadth first search through GT . In the implementation we have considered all aspects of OWL 2 QL which were ignored in the theoretical discussion presented in the previous sections (see Section 2). We have performed comparative experiments, where QuOnto was tested against several popular ontology reasoners. Specifically, during our test we com- pared ourselves with the Fact++ [24], Hermit [9], and Pellet [23] OWL reasoners, and with the CB [14] Horn SHIQ reasoner, and with the ELK [15] reasoner for those ontologies that are also in OWL 2 EL. The ontology suite used during testing includes twenty OWL ontologies, as- sembled from the TONES Ontology Repository3 and from other independent sources. The six reasoners exhibited negligible differences in performance for the majority of the smaller tested ontologies, so we will only discuss the ontologies which offered interesting results, meaning those on which reasoning times are significantly different for at least a subset of the reasoners. These ontologies include: the Mouse ontology; the Transportation on- tology4 ; the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [19]; the Athletic Events Ontology (AEO)5 ; the Gene Ontology (GO) [3]; two versions of the GALEN ontology [22]; and four versions of the Foundational Model of Anatomy Ontology (FMA) [10]. Because QuOnto is an OWL 2 QL reasoner, each benchmark ontology not in OWL 2 QL was preprocessed prior to classification in order to fit OWL 2 QL expressivity. Therefore, every OWL expression which cannot be expressed 3 http://owl.cs.manchester.ac.uk/repository/ 4 http://www.daml.org/ontologies/409 5 http://www.boemie.org/deliverable d 3 5 Ontology QuOnto FaCT++ HermiT Pellet CB ELK Mouse 0.156 0.282 0.296 0.179 0.159 0.246 Transportation 0.150 0.045 0.163 0.151 0.195 0.343 DOLCE 1.327 0.245 25.619 1.696 1.358 — AEO 0.650 0.743 0.920 0.647 0.605 — Gene 1.255 1.400 3.810 2.803 1.918 1.419 EL-Galen 2.788 109.835 7.966 50.770 2.446 1.205 Galen 4.600 145.485 34.608 timeout 2.505 — FMA 1.4 0.688 timeout 93.781 timeout 1.243 — FMA 2.0 4.111 out of memory out of memory timeout 7.142 — FMA 3.2.1 4.146 4.576 11.518 24.117 4.976 — FMA-OBO 4.827 timeout 50.842 16.852 7.433 4.078 Table 2: Classification times of benchmark OWL 2 QL ontologies by QuOnto and other tested reasoners. by OWL 2 QL axioms was approximated from the ontology specifications. This approximation follows this procedure: each axiom in the ontology is fed to an external reasoner, specifically Hermit, and every OWL 2 QL-compliant axiom that is implied from that axiom, between the ontology symbols that appear in it, is added to the OWL 2 QL-approximated ontology. For instance, the OWL assertion EquivalentClasses(ObjectUnionOf(:Male :Female) :Person) is approxi- mated by the two assertions SubClassOf(:Male :Person) and SubClassOf(:Female :Person). Note that, as is the case in this example, the OWL 2 QL-approximated ontology may contain a greater number of axioms than the original ontology. Ta- ble 1 shows that the Mouse, Transportation, Gene, and FMA-OBO ontologies are in OWL 2 QL, and thus do not need approximation, while AEO and FMA 1.4 are subject to minimal changes by the approximation procedure. During the tests for each reasoner, classification was performed on the OWL 2 QL-compliant versions of the ontologies resulting from the above described preprocessing. Metrics about the ontologies are reported in Table 1. All tests were performed on a DELL Latitude E6320 notebook with Intel Core i7-2640M 2.8Ghz CPU and 4GB of RAM, running Microsoft Windows 7 Premium operating system, and Java 1.6 with 2GB of heap space. Classification timeout was set at one hour, and aborting if maximum available memory was exhausted. All figures reported in Table 2 are in seconds, and, because classifi- cation results are subject to minor fluctuation, particularly when dealing with large ontologies, are the average of 3 classifications of the respective ontologies with each reasoner. The following versions of the OWL reasoners were tested: Fact++ v.1.5.3, HermiT v.1.3.6, Pellet v.2.3.0, CB v.12, and ELK v.0.3.2. In our test configuration, the classifications of the FMA 2.0 ontology by the Hermit and FaCT++ reasoners terminate due to an out-of-memory error. In [9], classification of this ontology by the Hermit reasoner is performed successfully, but classification time far exceeds the one registered by QuOnto. The results of the experiments are summarized in Table 2. These results confirm that the performance offered by QuOnto compares favorably to other reasoners for almost all tested ontologies. Classification for even the largest of the tested ontologies, i.e., the FMA-OBO and FMA 3.2.1 ontologies, is performed in under 5 seconds, and memory space issues were never experienced during our tests with QuOnto. For some test cases, the gap in performance between QuOnto and other reasoners is sizeable: for instance, classification by Pellet of the Galen and FMA (1.4 and 2.0) and by FaCT++ of the FMA (1.4 and OBO) ontologies exceeds the predetermined timeout limit of one hour. Detailed analysis of the results provided in Table 2 shows that only the CB and ELK reasoners consistently display comparable performances to QuOnto, which is fastest for all ontologies which feature only positive inclusions, with the exception of the EL-Galen, Galen, and FMA-OBO ontologies. The CB reasoner, which is the best-performing reasoner for the Galen ontology, does not however always perform complete classification. For instance, it does not compute prop- erty hierarchies. The ELK reasoner instead is slower than QuOnto for three out of the five ontologies also in OWL 2 EL, showing instead markedly better performance for EL-Galen. Furthermore, if, as it is usually the case, an ontology does not present unsat- isfiable predicates, the computation of such predicates through the exploration of all negative inclusions can be avoided. This is the case for ontologies such as DOLCE and AEO, for which computation of the set ΦT of positive inclusion assertions resulting from the transitive closure of GT is performed respectively in 0.347 and 0.384 seconds, fastest among tested reasoners. Instead, for ontologies such as Pizza and Transportation, which feature respectively 2 and 62 unsatis- fiable atomic concepts, the identification of all such predicates is unavoidable, and the resulting set of trivial inclusion assertions must be added to ΩT . 5 Conclusions The research presented in this paper can be extended in various directions. First of all, in the implementation of our technique we have adopted a naive algorithm for computing the digraph transitive closure. We are currently experimenting more sophisticated and efficient techniques for this task. We are also working to optimize the procedure through which we identify unsatisfiable predicates. Finally, we are working to extend our technique to compute all inclusions that are inferred by the TBox (which, in OWL 2 QL, are a finite number). In this respect, we notice that through GT∗ it is already possible to obtain the classification of all basic concepts, basic roles, and attributes, and not only that of predicates in the signature, and that, with slight modifications of computeUnsat, we can actually obtain the set of all negative inclusions inferred by an OWL 2 QL TBox. The remaining challenge is to devise an efficient mechanism to obtain all inferred positive inclusions involving qualified existential roles and attribute domains. Acknowledgments. This research has been partially supported by the EU under FP7 project Optique (grant n. FP7-318338), and by the EU under FP7- ICT project ACSI (grant no. 257593). References 1. A. Acciarri, D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, M. Palmieri, and R. Rosati. QuOnto: Querying Ontologies. In M. Veloso and S. Kambhampati, editors, Proc. of AAAI 2005, pages 1670–1671. AAAI Press/The MIT Press, 2005. 2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009. 3. M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, et al. Gene Ontology: tool for the unification of biology. Nature genetics, 25(1):25, 2000. 4. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, ed- itors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2nd edition, 2007. 5. J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2nd edition, 2008. 6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez- Muro, R. Rosati, M. Ruzzi, and D. F. Savo. The MASTRO system for ontology- based data access. Semantic Web J., 2(1):43–53, 2011. 7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 8. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo. MASTRO: A reasoner for effective ontology-based data access. In Proc. of ORE-2012, volume 858 of CEUR, ceur-ws.org, 2012. 9. B. Glimm, I. Horrocks, B. Motik, R. Shearer, and G. Stoilos. A novel approach to ontology classification. J. of Web Semantics, 14:84–101, 2012. 10. C. Golbreich, S. Zhang, and O. Bodenreider. The foundational model of anatomy in OWL: Experience and perspectives. J. of Web Semantics, 4(3):181–195, 2006. 11. V. Haarslev and R. Möller. RACER system description. In R. Goré, A. Leitsch, and T. Nipkow, editors, Proc. of IJCAR 2001, volume 2083 of LNCS, pages 701–706. Springer, 2001. 12. Q. Ji, P. Haase, G. Qi, P. Hitzler, and S. Stadtmüller. RaDON - repair and diagnosis in ontology networks. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyvönen, R. Mizoguchi, E. Oren, M. Sabou, and E. P. B. Simperl, editors, Proc. of ESWC 2009, volume 5554 of LNCS, pages 863–867. Springer, 2009. 13. A. Kalyanpur, B. Parsia, E. Sirin, and J. A. Hendler. Debugging unsatisfiable classes in OWL ontologies. J. of Web Semantics, 3(4):268–293, 2005. 14. Y. Kazakov. Consequence-driven reasoning for Horn SHIQ ontologies. In C. Boutilier, editor, Proc. of IJCAI 2009, pages 2040–2045. AAAI press, 2009. 15. Y. Kazakov, M. Krötzsch, and F. Simancik. Concurrent classification of EL on- tologies. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. F. Noy, and E. Blomqvist, editors, Proc. of ISWC 2011, volume 7031 of LNCS, pages 305–320. Springer, 2011. 16. M. Krötzsch. The not-so-easy task of computing class subsumptions in OWL RL. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. X. Parreira, J. Hendler, G. Schreiber, A. Bernstein, and E. Blomqvist, editors, Proc. of ISWC 2012, volume 7649 of LNCS, pages 279–294. Springer, 2012. 17. M. Lawley and C. Bousquet. Fast classification in Protégé: Snorocket as an OWL 2 EL reasoner. In T. Meyer, M. Orgun, and K. Taylor, editors, In Proc. of AOW 2010, volume 122 of CRPIT, pages 45–50. ACS, 2010. 18. D. Lembo, V. Santarelli, and D. F. Savo. Graph-based Ontology Classification in OWL 2 QL. In Proc. of ESWC 2013, 2013. (to appear). 19. C. Masolo, S. Borgo, A. Gangemi, N. Guarino, A. Oltramari, and L. Schneider. The wonderweb library of foundational ontologies and the DOLCE ontology. Technical Report D17, WonderWeb, 2002. 20. J. Mendez, A. Ecke, and A. Turhan. Implementing completion-based inferences for the EL-family. In Proc. of DL 2011, volume 745 of CEUR, ceur-ws.org, 2011. 21. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web Ontology Language – Profiles (2nd edition). W3C Recommenda- tion, World Wide Web Consortium, Dec. 2012. Available at http://www.w3.org/ TR/owl2-profiles/. 22. J. Rogers and A. Rector. The GALEN ontology. Medical Informatics Europe (MIE 96), pages 174–178, 1996. 23. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. J. of Web Semantics, 5(2):51–53, 2007. 24. D. Tsarkov and I. Horrocks. FaCT++ description dogic reasoner: System descrip- tion. In U. Furbach and N. Shankar, editors, Proc. of IJCAR 2006, volume 4130 of LNCS, pages 292–297. Springer, 2006.