Towards fast Atomic Decomposition using Axiom Dependency Hypergraphs? Francisco Martı́n-Recuerda1 and Dirk Walther2 1 Universidad Politécnica de Madrid, Spain fmartinrecuerda@fi.upm.es 2 TU Dresden, Germany Center for Advancing Electronics Dresden dirk.walther@tu-dresden.de Abstract. Atomic decomposition of ontologies has been suggested as a tool to understand the modular structure of ontologies. It consists of a polynomial size representation of potentially exponentially many modules of an ontology. Tractable algorithms for computing the atomic decomposition for locality-based modules have been introduced, albeit leaving room for improvement in terms of running time. In this paper, we consider ontologies formulated in OWL-EL. We introduce a notion of an axiom dependency hypergraph for an ontology, which represents how axioms are included in locality-based modules. We use a standard al- gorithm from graph theory to compute strongly connected components of the axiom dependency hypergraph and show that such components correspond to atoms of the ontology. An empirical evaluation of the al- gorithm on large fragments of biomedical ontologies confirms a significant improvement in running time. 1 Introduction The atomic decomposition of an ontology provides a compact representation of all possible modules that can be extracted from a given ontology. It represents a significant contribution to a better understanding of the internal structure of the ontology. In this paper, we introduce a novel representation model of OWL-EL on- tologies based on directed hypergraphs that explicitly represent information for syntactic locality and atomic decomposition. We call this model Axiom Depen- dency Hypergraphs (ADHs). A directed hypergraph is a generalisation of a di- rected graph in which edges (in this case hyperedges) can connect two nonempty disjoint sets of nodes (or vertices). The nodes in such hypergraphs represent the axioms of an ontology, and hyperedges indicate subset relations between terms ? We thank the reviewers of the workshop WoMO 2013 for their comments. The first author acknowledges the support of the EU project SEALS (FP7-ICT-238975), and the second author the support of the German Research Foundation (DFG) within the Cluster of Excellence ‘Center for Advancing Electronics Dresden’. in axioms. By using standard hyperpath search algorithms it is possible to effi- ciently extract locality-based modules (in linear time) and to compute the atomic decomposition of the ontology. The use of hypergraph-based models for locality-based module extraction is not new [8] and it has been recently further extended [6]. However, this is the first time that it has been applied for improving existent algorithms for atomic decom- position of OWL ontologies. We compare a prototypical implementation of our hypergraph model against the fastest implementation of atomic decomposition that we are aware of [10]. Our experiments confirm a significant improvement in running time for (certain fragments of) biomedical ontologies. The paper is organised as follows. In Section 2, we review the main notions on syntactic ⊥-locality, atomic decomposition, and hypergraphs. In Section 3, we introduce axiom dependency hypergraphs and discuss their size, and evaluate the performance of atomic decomposition based on these graphs in Section 4. We conclude this paper in a final section. 2 Preliminaries In this paper, we consider ontologies formulated in the description logic EL, which allows for conjunction and existential restrictions. Large parts of, e.g., biomedical ontologies are expressed in EL. For notions on description logics, we refer to [3] and for EL to [2]. Moreover, we consider locality-based modules. The notion of locality refers to axioms, i.e., axioms are either local or non-local, and it depends on a signature. A module of an ontology consists of the non-local axioms wrt. a signature. There are two flavours the locality notion comes in: semantic and syntactic locality. Intuitively, an axiom is semantically local wrt. a signature Σ if it does not say anything about the terms in Σ.1 Checking whether an axiom is semantically local can be computationally expensive as it requires reasoning.2 The notion of syntactic locality was introduced to allow for efficient computation. Checking for syntactic locality involves checking that an axiom is of a certain form, no reasoning is needed. On the downside, syntactic locality is merely an approximation of semantic locality: all syntactically local axioms are also semantically local, but not vice versa. Modules based on syntactic locality can be larger as they contain the modules based on semantic locality and possibly more axioms. We refer to [5] for a more extensive introduction to locality-based modularity. The notion of ⊥-locality depends on whether or not a given signature covers all symbols occurring on the LHS of a general concept inclusion, or occurring on the LHS or RHS of a general concept equation. The following proposition makes that precise.3 1 For sets of axioms with this property the term ‘safety’ is used in the literature [5]. 2 Reasoning with OWL2 is 2-NExpTime-complete in the worst case, but also reasoning with a tractable logic such as EL can be seen as too difficult in some cases, depending on the size of the input and the application requirements. 3 The notion of reachability-based modules uses a similar property; see Def. 37 in [8]. Proposition 1. Let α be an EL-axiom. Let Σ be a signature. Then: (i) if α = (C v D), then α is not syntactic ⊥-local wrt. Σ iff sig(LHS(α)) ⊆ Σ; (ii) if α = (C ≡ D), then α is not syntactic ⊥-local wrt. Σ iff sig(LHS(α)) ⊆ Σ or sig(RHS(α)) ⊆ Σ. a The following example illustrates the notion of ⊥-locality of axioms. Example 1. Let α and β be the axioms:4 α := A1 u ∃r1 .(A2 u ∃r2 .A3 ) v ∃r3 .A4 β := A1 ≡ A2 u ∃r1 .A3 Axiom α is syntactic ⊥-local wrt. Σ if any of the symbols occurring in LHS(α) is not contained in Σ, i.e., if α satisfies any of the following conditions: – Ai ∈ / Σ, for some i ∈ {1, 2, 3}; or – r1 ∈ / Σ or r2 ∈ / Σ. Axiom β is syntactic ⊥-local wrt. Σ if there are two symbols, one occurring in LHS(β) and one in RHS(β), that are not contained in Σ, i.e., if β satisfies any of the following conditions: – A1 ∈ / Σ and Ai ∈ / Σ, for some i ∈ {2, 3}; or – A1 ∈ / Σ and r1 ∈ / Σ.  2.1 Extracting modules based on locality This is the module extraction algorithm from [5], where x stands for any of the semantic and syntactic locality notions, i.e. x ∈ {∅, ∆, ⊥, >}. The algorithm runs in time quadratic in the size of the ontology and signature. function ModxO (Σ) returns x-local module of O wrt. Σ 1: m := 0 2: Mm := ∅ 3: do 4: m := m + 1 5: Mm := {α ∈ O | α is not x-local wrt. Σ ∪ sig(Mm−1 )} 6: until Mm = Mm−1 7: return Mm The algorithm takes an ontology O and a signature Σ as input and returns a subset M of O. The computed set M consists of axioms that are not x- local wrt. Σ ∪ sig(M), or equivalently, O \ M consists of axioms that are local wrt. Σ ∪ sig(M). The algorithm produces a finite sequence M0 , . . . , Mn , n ≥ 0, of subsets of the ontology O, where Mn is the ⊥-local module of O wrt. 4 These axioms are of the forms that occur frequently in the biomedical ontology Full-Galen. Σ ∪ sig(Mn ) with Σ being the initial signature. This sequence induces a relation on O representing the successive inclusion of the axioms into the module. Later we will represent this relation as hyperedges in an axiom dependency hypergraph. This is illustrated in the following example. Example 2. The signature increases round-by-round due to axioms that are added to the module (line 5). In fact, the RHSs of the axioms introduce new sym- bols to the signature. Let αi = Ai v Ai+1 , for i ∈ {1, ..., n}. Let O = {α1 , ..., αn } and Σ = {A1 }. We run the algorithm on the input O and Σ. In the first round of the do-until loop (lines 3-6), the axiom α1 is the only axiom in O that is not ⊥-local wrt. Σ ∪ sig(M0 ) (line 5), where M0 = ∅ (line 2). So α1 is added to M1 . In the second round, ⊥-locality is checked for the extended signature Σ ∪ sig(M1 ) = {A1 , A2 }. Axioms α1 and α2 are now non-⊥ local wrt. Σ ∪ sig(M1 ) and both are added to M2 . The algorithm proceeds this way until all axioms of O are added to Mn . As no more axioms are added the algorithm exits the do-until loop (line 6) and returns Mn+1 = O as the ⊥-local module of O wrt. Σ (line 7). The following picture shows the module extraction process as a graph Gmod⊥ (O,Σ) = (V, E), where V = {α1 , ..., αn } and E = {(αi , Ai+1 , αi+1 ) | i ∈ {1, ..., n − 1}}, and the sequence M0 , ..., Mn , Mn+1 produced by the module extraction algorithm. Additionally we indicate with a tilted arrow labelled with A1 to node α1 that the concept name A1 is provided by the initial signature. In this case, α1 is the first axiom to be included by the module extraction algorithm. A1 R A2 @ @  A 3 -  A 4 - ... An-  M0 α1 - α2 α3 αn M1  M2  M3  Mn = Mn+1  2.2 Atomic decomposition Atoms represent sets of highly related axioms in the sense that they always co-occur in modules. A set a of axioms from an ontology O is an atom of O if for every module M of O, it holds that a ⊆ M or a ∩ M = ∅. We denote with AtomsO the set of all atoms of O. The atoms of an ontology partition the set of its axioms (i.e., each axiom occurs in exactly one atom). A dependency relation between pairs of atoms can be established: an atom a2 depends on an atom a1 in an ontology O (written a1 -locality can be defined in a similar way. 10 Note that the minimality condition in Def. 1 is to avoid superfluous hyperedges, but it is not required in terms of correctness. Proposition 2. Let O be an EL-ontology and α an EL-axiom. Then: Mod⊥ O (α) = ⊥ {β | α is B-connected to β in HO }. a Reachability-based modules have been defined in other hypergraph respresenta- tions of ontologies in [8]. This proposition can be shown similarly. The set of atoms for an EL-ontology O corresponds to the set of strongly con- ⊥ nected components in the axiom dependency hypergraph for O. Let SCCs(HO ) ⊥ be the set of strongly connected components of the hypergraph HO . Proposition 3. Let O be an EL-ontology. Then: Atoms⊥ ⊥ O = SCCs(HO ). a The proposition can readily be seen as follows. As a corollary of Proposi- tion 2, mutually reachable axioms are contained in the same modules, which is equivalent to the axioms forming an atom of the ontology. The nodes in a strongly connected component are all mutually reachable. Thus, both notions are equivalent. The dependencies between atoms for an EL-ontology O (as given by the re- lation <⊥O ) correspond to a certain binary relation over the set of strongly con- ⊥ ⊥ nected components of HO , which is defined as follows. For S1 , S2 ∈ SCCs(HO ), ⊥ we say that S2 depends on S1 (written S1 →O S2 ) if there is an hyperedge ⊥ ∗ e = (T (e), H(e)) in HO such that T (e) ⊆ S1 and H(e) ⊆ S2 . Let →⊥ O be the reflexive, transitive closure of →⊥O. Proposition 4. Let O be an EL-ontology. Let a1 , a2 ∈ Atoms⊥ O and S1 , S2 ∈ ⊥ ⊥∗ SCCs(HO ) such that a1 = S1 and a2 = S2 . Then: a1 <⊥ a O 2 iff S1 →O S2 . a Example 4. Consider again the ontology O and its axiom dependency hyper- ⊥ graph HO from Example 3. We obtain the following ⊥-local modules for the axioms: Mod⊥ O (α1 ) = {α1 , α4 , α5 } Mod⊥ O (α4 ) = {α1 , α4 , α5 } Mod⊥ O (α2 ) = {α1 , α2 , α3 , α4 , α5 } ⊥ ModO (α5 ) = {α1 , α4 , α5 } Mod⊥ O (α3 ) = {α1 , α2 , α3 , α4 , α5 } The resulting atoms in AtomsO are a1 = {α2 , α3 } and a2 = {α1 , α4 , α5 }, where ⊥ a1 < a2 , i.e., a2 depends on a1 . The ADH HO and the atoms of O can be depicted as follows:    α2  α1  α5  *   HH Y H H   HHH 6  HHH  ?  HHH  HH j H ?  α3 - α4   a1 a2 ⊥ Consider the strongly connected components of HO . Axiom α1 is B-connected with the axioms α4 and α5 , α4 is B-connected with α1 and α5 , and α5 is B- connected with α1 and α4 . Axiom α2 is B-connected with α3 and vice versa. Axioms α2 , α3 are each B-connected with α1 , α4 and α5 , but not vice versa. ⊥ Hence, {α1 , α2 , α4 } and {α2 , α3 } are the strongly connected components of HO . Moreover, we say that the former component depends on the latter as any two axioms contained in them are unilaterally and not mutually B-connected. Note that the atoms a1 and a2 of O and their dependency coincide with the strongly connected components of HO ⊥ .  For ontologies consisting of axioms of the form A v C, where A is a concept name, the locality dependencies between axioms can be represented in directed graphs (i.e., hypergraphs in which the tail of any hyperedge contains only one axiom). For such ontologies O, the definition of the axiom dependency hyper- ⊥ graph HO = (V, E) can be simplified (cf. Def. 1). The condition for a hyperedge e = ({α}, {β}), for α, β ∈ V, to be contained in E is now: e = ({α}, {β}) ∈ E iff sig(LHS(β)) ⊆ sig(α) The advantage of this representation is that we can directly use a standard linear time algorithm for calculating strongly connected components of directed graphs [9, 7]. We notice the majority of axioms in biomedical ontologies (that we have considered in the evaluation part of this paper) are axioms of the form A v C. For general ontologies, we can apply a three-phase approach similar to the one suggested in [1] for hypergraphs: first, we compute the strongly connected components for axioms of the form A v C using a linear-time algorithm; second, we collapse the strongly connected components into single nodes; and, finally, we compute the strongly connected components of the revised axiom dependency hypergraph using the notion of mutual reachability (which can be computed in at most quadratic time [1]). 3.1 Size of axiom dependency hypergraphs (worst-case) We observe that axiom dependency hypergraphs may be of exponential size. Proposition 5. There exists ontologies O such that the axiom dependency hy- pergraph HO contains exponentially many hyperedges in the number of axioms of O. a We show the proposition with the following example. Example 5. Let n, k be two natural numbers with n > 0 and k > 0. Ontology On,k is defined as: On,k = {α := X1 u. . .uXn v Y } ∪ {αji := Aij v Xi | i ∈ {1, ..., n}, j ∈ {1, ..., k}} Axiom α is the only axiom in On,k with a complex LHS, which in turn is a conjunction of n many concept names X1 , ..., Xn . The axioms αji state that each concept name Xi subsumes k many concept names Ai1 , ..., Aik . The corresponding axiom dependency graph HOn,k is the the tuple HOn,k = (V, E), where V = On,k and E = {({β1 , ..., βn }, {α}) | βi ∈ {α1i , ..., αki }, i ∈ {1, ..., n}}. It can readily be seen that HOn,k contains k n many hyperedges, where n, k are each bound by the number of axioms in On,k .  The axiom dependency hypergraph serves as a tool to determine the atoms of an ontology via calculating the strongly connected components of the graph. Example 5 illustrates a problem with the size of the axiom dependency hyper- graphs as defined in Def. 1. We note, however, that no hyperedge in HOn,k is needed for the purpose of determining the atoms for On,k . For every axiom β in On,k , the ⊥-local module Mod⊥ O (β) is a singleton set consisting of β itself. Conse- quently, the atoms of On,k are singleton sets as well as are the strongly connected components of HOn,k . Note that we obtain the same strongly connected compo- nents after removing every hyperedge from HOn,k . This observation suggests a possible solution to avoid the exponential blow-up problem. The idea is to build the hypergraph using only the “necessary” hyperedges that are required for the computation of the strongly connected components which in turn correspond to atoms. We hypothesize that the hyperedges needed are the ones that preserve the mutual reachability relation between the axioms of the ontology. This means that no more incoming hyperedges are needed per axiom than the total number of axioms in the ontology. We leave a formal treatment of this idea for future work. 3.2 Size of axiom dependency hypergraphs for real ontologies Primitive concept inclusions, i.e. axioms of the form A v C, where A is a concept name and C a possibly complex concept, are of prime importance for biomedical ontologies. For instance, the majority of axioms in the biomedical ontologies from the Bioportal in the following table are primitive concept inclusions.11 Ontology Signature #axioms percentage #axioms #role size A v C of all axioms C ≡ D axioms in ontology CPO (12/2011) 136 090 306 111 80.61% 73 461 96 FMA-lite 75 168 119 558 99.99% 0 3 Full-Galen (v1.1) 24 088 25 563 67.81% 9 968 2 165 GO v1.1 (1986) 34 220 61 990 99.99% 0 5 NCBI (v1.2) 847 796 847 755 100% 0 0 RH-MeSH (08/2012) 286 382 403 210 100% 0 0 Snomed CT (2010a) 291 207 227 698 78.18% 63 446 12 Most ontologies contain other axioms as well such as concept equations and role axioms.12 The table shows the percentage of axioms of the form A v C to all axioms in each ontology. The ontologies NCBI and RH-Mesh are taxonomies (no logical operators); they consist entirely of axioms of the form A v B, where A, B are concept names. 11 http://bioportal.bioontology.org 12 Other axiom types are not shown here, e.g., disjointness axioms in CPO. Ontology Signat. ADH A v C-fragment size #nodes #edges #SCCs CPO (12/2011) 136 027 306 111 1 474 962 131 482 FMA-lite 75 141 119 558 1 021 863 54 649 Full-Galen (v1.1) 16 412 25 563 179 770 12 502 GO v1.1 (1986) 34 175 61 990 255 215 34 167 NCBI (v1.2) 847 760 847 755 1 695 474 847 755 RH-MeSH (08/2012) 286 380 403 210 1 435 631 286 263 Snomed CT (2010a) 240 021 227 698 556 474 227 698 The difference between the number of SCCs and the number of nodes in the axiom dependency hypergraph can be seen as a measure of cyclicity of the graph. The following example illustrates the limitations of axiom dependency graphs. Axioms of the form C v D or C ≡ D (i.e. axioms whose ⊥-locality signatures contains more than one symbol) can cause many hyperedges to be included in the axiom dependency hypergraph. Example 6. Consider this concept equation α from the ontology Full-Galen: Incontinence ≡ Transport u ∃ actsOn.BodySubstance u ∃ hasIntentionality.(Intentionality u ∃ hasAbsoluteState.involuntary) u ∃ hasUniqueAssociatedDisplacement.(Displacement u ∃ isDisplacementTo.GRAILExteriorOfBody) ⊥ The axiom dependency hypergraph HFull-Galen contains up to 9.2 × 1012 many hyperedges of the form e = (T (e), {α}), where T (e) is a set of axioms containing the 12 signature symbols from ⊥-sig(α).13  As indicated in Section 3.1, the number of hyperedges may be reduced while preserving the strongly connected components. The idea is that no more incom- ing hyperedges are needed per axiom as there are axioms in the ontology. In the case of Full-Galen, we have an upper bound of 37 696 many incoming hyperedges per axiom. In particular, for axiom α from Example 6 we estimate up to 21 095 many hyperedges to preserve the mutual reachability relation involving α. 4 Evaluation We have implemented a Java program that takes an EL-ontology O (with axioms of the form A v C) as an input and builds the corresponding axiom dependency ⊥ ⊥ hypergraph HO . Then it computes the set SCCs(HO ) of strongly connected com- ⊥ ponents of HO and the dependencies between these components. We compare the running times of atomic decomposition using FaCT++ with the times needed by the hypergraph-based approach. The following table lists the results.14 13 The number of hyperedges is merely an estimated upper bound (as it does not take into account the minimality condition in Def. 1). 14 All experiments were performed on an Intel Xeon E5-2640 2.50 GHz with Java version 1.7.0 40 (command line parameters -Xss1G -Xms4G -Xmx8G). We used FaCT++, 64bit, Version 1.6.2 (19 February 2013) and the OWL API version 3.4.4. Ontology fragment time time hypergraph-based approach speedup with axioms of the FaCT++ load build comp. comp. total form A v C ont. ADH SCC deps. CPO (12/2011) 1 262.69 s 9.99 s 0.85 s 0.59 s 0.52 s 11.94 s 105.8 FMA-lite 19 991.92 s 9.50 s 0.49 s 6.43 s 0.20 s 16.62 s 1 202.9 Full-Galen (v1.1) 115.77 s 2.87 s 0.09 s 0.20 s 0.05 s 3.21 s 36.1 GO v1.1 (1986) 44.33 s 3.42 s 0.15 s 0.16 s 0.09 s 3.82 s 11.6 NCBI (v1.2) 49 227.86 s 73.72 s 1.42 s 0.87 s 1.19 s 77.22 s 637.5 RH-MeSH (08/2012) 6 921.62 s 15.24 s 1.31 s 0.63 s 0.66 s 17.84 s 388.0 Snomed CT (2010a) 4 538.65 s 14.47 s 0.48 s 0.31 s 0.32 s 15.58 s 291.3 The times for FaCT++ are the total times needed (2nd column), which in- cludes loading the ontology, initialising the reasoner and computing the atoms (but not saving the atoms). For the hypergraph-based approach, we have listed the times needed for the OWL-API to load the ontology and to do some initial- isation (3rd column), to build the axiom dependency hypergraph (4th column), to compute the strongly connected components of the graph (5th column), to compute the dependencies between the strongly connected components (6th col- umn) and the total time needed (6th column). The last column shows for each ontology the speedup achieved for atomic decomposition using the hypergraph- based approach compared to FaCT++. On FMA-lite an over 1 000-fold speedup was realised. FaCT++ improves upon the original algorithm for atomic decomposition [11] as described in [10]. However, the algorithm implemented in FaCT++ runs in time almost cubic (in particular, the computation of the transitive closure in Line 7 of Algorithm 4 in [10]). In contrast, the approach based on axiom de- pendency hypergraphs runs in time linear for the considered fragment of the ontologies. 5 Conclusion We have suggested a hypergraph-based method for efficient atomic decompo- sition of EL-ontologies consisting of primitive concept inclusions. We have in- troduced the notion of an axiom dependency hypergraph, in which, different to other hypergraph representations of ontologies, axioms are nodes that are connected in a way mimicking how axioms are included in a module by the locality-based module extraction algorithm. Modularisation and atomic decomposition has been suggested as a means to understand the internal structure of ontologies. Axiom dependency hypergraphs are an explicit representation of these notions. A module can be extracted from the hypergraph by collecting the nodes reachable from the nodes that are deter- mined by the input signature. Moreover, it is possible to directly apply standard off-the-shelf graph algorithms for computing the atoms of certain EL-ontologies in linear time. For the atomic decomposition of FMA-lite, a staggering 1 000-fold speedup could be achieved compared to the reasoner FaCT++. The disadvantage of the axiom dependency hypergraphs, as introduced in this paper, is their exponential size in the worst case for general concept inclu- sions. We have indicated that the blow-up in size can be avoided by omitting hyperedges while still preserving the mutual reachability relation between ax- ioms. In this way, no more than quadratically many hyperedges are required for determining the atoms. For future work, the idea of avoiding the exponential blow-up of the ax- iom dependency hypergraphs for general EL-ontologies needs to be formalised, implemented and evaluated. It would also be interesting to extend the current approach to other locality notions such as >- and ⊥>∗ -locality and to richer languages such as SROIQ. References 1. X. Allamigeon. Strongly connected components of directed hypergraphs. CoRR, abs/1112.1444, 2011. 2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proc. of IJCAI-05. Morgan-Kaufmann Publishers, 2005. 3. F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The description logic handbook: theory, implementation, and applications. Cambridge University Press, 2007. 4. G. Gallo, G. Longo, S. Pallottino, and S. Nguyen. Directed hypergraphs and applications. Discrete Applied Mathematics, 42(23):177–201, 1993. 5. B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of ontologies: theory and practice. JAIR, 31:273–318, 2008. 6. R. Nortje, A. Britz, and T. Meyer. A normal form for hypergraph-based module extraction for SROIQ. In Proc. of AOW 2012, volume 969 of CEUR Workshop Proceedings, pages 40–51. CEUR-WS.org, 2012. 7. M. Sharir. A strong connectivity algorithm and its applications to data flow anal- ysis. Computers & Mathematics with Applications, 7(1):67–72, 1981. 8. B. Suntisrivaraporn. Polynomial time reasoning support for design and mainte- nance of large-scale biomedical ontologies. PhD thesis, TU Dresden, Germany, 2009. 9. R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1(2):146–160, 1972. 10. D. Tsarkov. Improved algorithms for module extraction and atomic decomposition. In Proc. of DL’12, volume 846 of CEUR Workshop Proceedings. CEUR-WS.org, 2012. 11. C. D. Vescovo, B. Parsia, U. Sattler, and T. Schneider. The modular structure of an ontology: Atomic decomposition. In Proc. of IJCAI’11, pages 2232–2237, 2011.