Constructing E − SHIQ Distributed Knowledge Bases via Ontology Modularization: The mONTul method

Constructing E − SHIQ Distributed Knowledge Bases via Ontology Modularization: The mONTul method GeorgiosSantipantakis University of the Aegean

Greece

GeorgeVouros University of Piraeus

Greece

Constructing E − SHIQ Distributed Knowledge Bases via Ontology Modularization: The mONTul method F3E07CE50896A962B45C0590D09DFF77 GROBID - A machine learning software for extracting information from scholarly documents ontology modularization locality constraint satisfaction problem correct and complete reasoning

This article presents a reconfigurable method for the modularization of SHIQ ontologies, towards the construction of distributed E − SHIQ knowledge bases. The aim is to compute decompositions for correct, complete and efficient distributed reasoning. The proposed method combines graph-based modularization techniques with localitybased rules using a generic constraint problem solving framework. The paper presents experimental results concerning the modularization task, w.r.t. specific requirements for efficient distributed reasoning.

Introduction

Modularization denotes either the extraction of modules from large ontologies (e.g. for reusability purposes) or the partitioning of ontologies into modules (e.g. for facilitating evolution or reasoning tasks). In both cases, modules should not represent arbitrary chunks of knowledge. Each module must represent aspects from a specific sub-domain of the original ontology. Considering the whole decomposition of an ontology, all and only those entailments of the original ontology must be entailed by reasoning over the collection of modules extracted. It follows that the modular ontology must be in a decidable fragment of a representation language.

The aim in this paper is to propose a method for decomposing any ontology in the SHIQ fragment of Description Logics (DL) into an arbitrary number of modules that form a distributed E − SHIQ knowledge base, enabling correct and complete reasoning. Each module must preserve to a great extent the meaning of the terms in its signature. The E − SHIQ representation framework allows modules to be semantically associated, thus reducing replication of axioms between them. It follows that a distributed knowledge base is a network of associated modules and the meaning of terms in the decomposition is preserved by considering the coupling of modules. The decomposition can facilitate efficient reasoning using the sound and complete E − SHIQ distributed reasoner.

Targeting efficient distributed reasoning tasks, as shown by the experiments performed using the E − SHIQ reasoner [9], additional properties that must be preserved by the networks of associated modules include: (a) Axioms must be distributed between modules in an as much as possible even way, (b) networks of modules must have a small diameter, and (c) there must not be many (and large) cycles of associated modules.

Existing modularization methods [6] are based on logical foundations, or may apply methods manipulating graphs that resemble the structure of the original ontology. While logic-based methods preserve certain logical properties (e.g. [1], [2], [3], [4], [5]), graph-based approaches apply heuristics to partition (e.g. [7], [8]) an ontology into modules. These methods usually utilize network metrics and empirical "distance measures" to define the module boundaries. The proposed modularization method combines graph-based modularization techniques with localitybased rules using a generic constraint problem solving framework. Ontology specifications are used for building a graph of dependencies among the ontology terms. These dependencies correspond to specific constraints concerning the assignment of terms in modules. The satisfaction of these constraints secure the construction of a well-formed E − SHIQ distributed knowledge base and specify conditions for locality-based modularization.

The specific contributions in this paper are the following: (a) It proposes the mONTul modularization method that combines rules for constructing well-formed E − SHIQ knowledge bases with rules for locality-based modularization in a constraint satisfaction problem. (b) The mONTul method proposed is reconfigurable and generic in several aspects: (i) The set of constraints may change w.r.t. the expressiveness of the original ontology and the modularization requirements, (ii) constraints may be considered to be soft or hard, in any arbitrary combination, (iii) mONTul can be used for extracting modules, although the focus here is on computing partitionings; and finally, (iv) the method itself has been developed in a modular architecture, allowing components to be replaced by state-of-the-art methods. (c) It revises the notion of locality by also considering possible associations between terms of different modules in a distributed E − SHIQ knowledge base.

In the next sections, Section 1 briefly presents the background knowledge and Section 2 revises the notion of locality for E − SHIQ. Section 3 presents the mONTul method, Section 4 the experimental results and Section 5 concludes the paper.

Background Knowledge

This section presents background knowledge on the E − SHIQ representation framework [9] and on locality-based modularization [1]. To provide concrete examples, we consider the ontology O with the axioms:

Conf erence ≡ M edicalConf erence OtherConf erence (1)

Conf erence Event

(2) Event HumanActivity

(3) P ediatricConf erence M edicalConf erence (4) Article P ublishedM aterial ∃P resentedAt.Event

(5) M edicalArticle Article ∀P resentedAt.M edicalConf erence (6) The E-SHIQ representation framework [9] belongs to the family of modular representation frameworks for Description Logics. It provides constructors for associating ontology units (modules) that are within the SHIQ fragment of Description Logics, preserving decidability.

Given a non-empty set of indices I and a collection of modules indexed by I, let N Ci , N Ri and N Oi be the sets of concept, role and individual names for unit i ∈ I, respectively. For some R ∈ N Ri , Inv(R) denotes the inverse role of R and (N Ri ∪ {Inv(R)|R ∈ N Ri }) is the set of SHIQ i-roles, i.e. the roles of the i-th ontology M i . An i-role axiom is either a role inclusion axiom or a transitivity axiom. Let R i be the set of i-role axioms.

Let E ij be the set of ij-link relations relating individuals in M i and M j , i = j ∈ I. The sets of link relations are not pairwise disjoint, but are disjoint with respect to the set of concept names. An ij-relation box R ij includes a finite set of ij − link relation inclusion axioms in case i = j, and transitivity axioms of the form T rans(E, (i, j)), where E is in (E ij ∩ N Ri ), i.e. it is an ij-link relation and an i-role. In case i = j, then R ij =R i (with an abuse of notation) includes a finite set of i − role inclusion axioms. Subsequently we use the term property to denote both roles and link-relations.

Briefly, the sets of i-concepts are inductively defined by the constructors within the SHIQ fragment of Description Logics, where i-roles can be replaced by ij-link relations, relating instances of the i-concept to instances of a j-concept, where i = j ∈ I. Let i : C and i : D possibly complex concepts and i : C i : D (or i : C D) a general concept inclusion (GCI) axiom in M i . A finite set of GCI's in M i is a TBox for i and it is denoted by T i .

Concept correspondences may be concept onto concept, or concept into concept: Let C ∈ N Ci , D ∈ N Cj with i = j ∈ I. A concept onto (into) concept correspondence from M i to M j that subjectively holds for M j , is of the form i : C → j : D (corresp. i : C → j : D).

Definition (Distributed Knowledge Base).

A distributed knowledge base Σ = T, R, C is composed of the distributed TBox T, the distributed RBox R, and a tuple of sets of correspondences C = (C ij ) i =j∈I between modules. A distributed TBox is a tuple of TBoxes T= (T i ) i∈I , where each T i is a finite set of i-concept inclusion axioms. A distributed RBox is a tuple of ij-property boxes R = (R ij ) i,j∈I , where each R ij is a finite set of property inclusion axioms and transitivity axioms.

A distributed ABox (DAB) includes a tuple of ABox'es A i for each ontology unit M i , and sets A ij , i = j with individual correspondences of the form j:a = → i:b, and property assertions of the form (a, b) : E ij , where E ij is an ij-link relation in E ij , i = j. Thus, individual correspondences are specified from the subjective point of view of M i and, together with assertions concerning linked individuals, these are made locally available to M i . Example: Let us for instance consider two units M 1 , M 2 for I = {1, 2}, computed from ontology O. Then, the E − SHIQ distributed knowledge base is Σ = T, R, C , and is composed by the distributed TBox T, the distributed RBox R, and a tuple of sets of correspondences C = (C ij ) i =j∈I between ontology units. Specifically, -T= (Ti)i∈I , where T1 = {1, 2, 3, 4}, (indicating the axioms included) and T2 = {5, 6}.

-R = ((Ri)i∈I , (Rij) i =j∈I ), where Ri = Rij = ∅, i, j ∈ I,

-C= (Cij) i =j∈I , where C21 = {2 : M edicalConf erence ≡ → 1 : M edicalConf erence, 2 : Event ≡ → 1 : Event} C12 = {1 : M edicalConf erence ≡ → 2 : M edicalConf erence, 1 : Event ≡ → 2 : Event} -DAB = ((Ai)i∈I , (Aij) i =j∈I )

, where Ai = ∅ and Aij = ∅, for any i, j ∈ I.

For the sake of brevity, we use equivalence subjective correspondences: These are actually specified by means of onto and into subjective correspondences. Please notice that both modules hold subjective knowledge on concepts correspondences, and these are symmetric. Finally, it must be noticed that the property P resentedAt is a 2-role, only: E − SHIQ does not support correspondences between roles.

A distributed knowledge base forms a network of associated (via correspondences and link relations) modules. Subsequently we use the terms network (of modules), decomposition and distributed knowledge base interchangeably. Associations are directional and may form cycles in the network of modules (as also shown in the example above).

Definition (Domain relations). Domain relations rij , i = j ∈ I represent equalities between individuals, from the subjective point of view of j. A domain relation r ij , i = j from ∆ i to ∆ j is a subset of ∆ i × ∆ j , s.t. in case d ∈ r ij (d 1 ) and d ∈ r ij (d 2 ), then according to the subjective view of j, d 1 = d 2 (denoted by d 1 = j d 2 ). Also, given a subset D of ∆ Ii , r ij (D) denotes ∪ d∈D r ij (d).

Given that domain relations represent equalities, in case

d 1 ∈ r ij (d) and d 2 ∈ r ij (d), then d 1 = j d 2 .

Therefore, E − SHIQ domain relations are globally one-to-one relations.

Definition (Distributed Interpretation). Given the index I and i, j ∈ I, a distributed interpretation I of a distributed knowledge base Σ is the tuple formed by the interpretations I ij = ∆ i , ∆ j , • Iij , i, j ∈ I, and a set of domain relations r ij , in case i = j ∈ I. Formally, I = (I ij ) i,j∈I , (r ij ) i =j∈I .

A local interpretation I i satisfies an i-concept C w.r.t. a distributed knowledge base Σ, i.e.

I i i : C iff C Ii = ∅. I i satisfies an axiom C D between i-concepts (i.e. I i i : C D) if C Ii ⊆ D Ii . Also, I ij satisfies an ij-property inclusion axiom R S (I ij R S) if R Iij ⊆ S Iij . A transitivity axiom T rans(E; (i, j)) is satisfied by I iff E Ii ∪ E Iij is transitive.

Definition (Distributed entailment and satisfiability). Σ d X Y if for every I, I d Σ implies I d X Y , where X and Y are either i-concepts or ij-properties, i, j ∈ I. Σ is satisfiable if there exists a I s.t. I d Σ. A concept i:C is satisfiable with respect to Σ if there is a I s.t. I d Σ and C Ii = ∅.

The E − SHIQ distributed reasoner implements a sound and complete tableau algorithm [9] for combining local reasoning chunks corresponding to the individual modules in a peer-to-peer fashion, inherently supporting the propagation of subsumptions between reasoning peers. The key mechanism for distributed reasoning is the projection of nodes from the completion graph of one peer (with a module) to another peer (with an associated module). This allows reasoning peers to combine their local (subjective) knowledge with the knowledge that other peers hold, in order to jointly compute entailments. Through projections, peers propagate subsumptions through their connections to distant peers. Each projec-tion request from a peer to another can trigger further projections from the later, resulting to "avalanches" of triggered projections across paths in the network of associated modules. Due to this, as also shown in [9], the distributed reasoner is affected by (a) the distribution of axioms in modules, (b) the diameter of the network of modules, and (c) the number of cycles in the network of modules.

Locality-based modularization. The locality-based modularization extracts a module for a given signature s.t. it can compute all and only the entailment of the original ontology involving the terms in the signature. Formally, a module M for an ontology O in the language L, w.r.t. a signature S is an ontology M ⊆ O s.t. M and O entail the same axioms over S in L. As shown in [5], this notion can be formalized using the notion of model-based conservative extensions. In this case, every model of a module M of O can be extended to a model of O without changing the interpretation domain or the interpretation of symbols in S. Thus, given an O and S ⊆ Sig(O), a module M ⊆ O is defined to be a module of O for S if O is a model conservative extension of M for S.

Since the problem of checking whether any "part" M is a module of O for a signature S is undecidable for fairly lightweight fragments of OWL [4], [5], we need to compute approximations. According to [1] and [2], a sufficient condition for a conservative model is locality: Definition (∅-locality). Let S be a signature. An interpretation is ∅-local for S if for every class A and property R not in S, we have

A I = R I = ∅. An axiom α is ∅-local for S if I α for each I that is ∅-local for S. An ontology O is ∅-local for S if every axiom in O is ∅-local for S.

Example. We can easily check that axioms (5) and ( 6) in O\M 1 are ∅-local for Sig(M 1 )= {Conference, MedicalConference, OtherConference, Event, HumanActivity, PediatricConference}: Any interpretation I that interprets all symbols in Sig(O) \ Sig(M 1 ) as the empty set, is ∅-local for S=Sig(M 1 ). However, M 1 is not ∅-local for Sig(M 2 ) if we consider that Sig(M 2 ) includes the terms Event and/or M edicalConf erence: Axiom (3) for instance can not be made ∅local for any interpretation that interprets HumanActivity as the empty set, since Event ∈ Sig(M 2 ). This is the case for the axiom (1), as well, due to the concept M edicalConf erence.

Since checking ∅-locality is proved costly as well, we can use the following syntactic conditions for checking ⊥-locality for a specific signature S. ⊥-locality implies ∅-locality [1], [2].

Given a signature S ⊆ Sig(O) for a SHIQ ontology O, the following grammar recursively defines the positive ⊥-concepts, denoted as C + S , for S:

C + S ::= A + | ¬C − | C C + | ∃R + .C | ∃R.C + |≥ nR + .C |≥ nR.C +(= ¬C + | C − 1 C − 2 | C − C | ∀R + .C | ∀R.C − |≤ nR + .C |≤ nR.C − (8)

The other constructs of SHIQ can be expressed using the above constructors, so they can be used in local concepts as well.

Definition (⊥-module).

Let O be an ontology and let S be a signature,

S ⊆ Sig(O). M ⊆ O is a ⊥-module for O for S, if O\M is ⊥-local for S ∪ Sig(M).

Example. We can easily check that M 2 = O\M 1 with axioms 5 and 6 is ⊥-local for S ∪ Sig(M 1 ) = {Conference, MedicalConference, OtherConference, Event, HumanActivity, PediatricConference}.

E − SHIQ locality-based modules

Given a distributed E − SHIQ knowledge base indexed by I, we have to specify the rules for deciding when an individual E − SHIQ module M i , i ∈ I, is a ⊥−module for its signature. To do this, we need first to refine the rules for ⊥concepts specified in formulae ( 7) and ( 8), so as (a) to assure that i-roles are specified for i-concepts only, i ∈ I, and (b) to incorporate the cases where iconcepts are constructed by means of restrictions on link relations.

Therefore, denoting a role in an E − SHIQ knowledge base with R and a link relation with E, then for an E − SHIQ module M i , i ∈ I, the rules for positive and negative ⊥−concepts for S = Sig(M i ) are:

C + S ::= A + | ¬C − | C C + | ∃R + .C + | ∃E + .C − |≥ nR + .C + |≥ nE + .C − (9) C − S ::= ¬C + | C − 1 C − 2 | C − C | ∀R − .C − | ∀E + .C − |≤ nR − .C − |≤ nE + .C − (10)

Example. Considering our example with M 1 including axioms {1, 2, 3, 4} and M 2 the axioms {5, 6}, then for Sig(M 1 ) = {Conference, MedicalConference,OtherConference, Conference,Event, HumanActivity, PediatricConference}, the concept ∃P resentedAt.Event is ⊥−local for Sig(M 1 ), if P resentedAt is a 21-link relation, according to the fifth rule for positive ⊥−concepts.

Forming P resentedAt as a 21-link relation makes the M 2 -concept ∀P resentedAt.M edicalConf erence a negative ⊥−concept for Sig(M 1 ). Nevertheless, in this case the module M 2 is ⊥−local for Sig(M 1 ), given that Article is a positive ⊥−concept for Sig(M 1 ).

The above example shows that concepts of the form ∀R.C may present difficulties for maintaining the ⊥−locality of E − SHIQ, depending on the context of their appearance.

To completely define ⊥−locality for E − SHIQ, we must also consider the subjective correspondences between concept names. Considering subjective onto concept correspondences between the module M i and any other module M j , i = j ∈ I, in a distributed E − SHIQ knowledge base, then similarly to GCIs for the module M i , the following rule holds for making an onto correspondence ⊥−local for Sig(M j ):

j : C → i : C + S

At this point we have to clarify that correspondences that M i subjectively holds do not affect the locality of M j .

In the general case, into correspondences of M i , can not be ⊥−local for Sig(M j ), since they are of the form:

j : C − S → i : C + S

Nevertheless, such a subjective correspondence can be ⊥−local for S = Sig(M j )\(Sig(M j ) ∩ Sig(M i )) when the same concept name appears in both sides of the correspondence and this concept name belongs in Sig(M i ). Such a correspondence is of the form j : C + S → i : C + S , where S = Sig(Mj)\Sig(Mi).

It must be noticed that the above hold for equivalence correspondences as well, since these are conjunctions of onto and into subjective correspondences.

Example: Considering again our running example, let us consider that axioms (2) and ( 3) in M 1 are formed as correspondences. Thus, the knowledge base becomes as follows: M 1 includes the axioms {1, 4}, M 2 the axioms {5, 6}, S = Sig(M 1 ) = {Conference, MedicalConference,OtherConference,HumanActivity, PediatricConference} and Sig(M 2 ) = Sig(O\M 1 ). There are also two correspondences between modules: 1:Conference − S → 2:Event + S and 1:HumanActivity − S → 2:Event + S . Focusing on the correspondences, we can observe that the first one is not ⊥ − local for M 2 (both correspondences are from the subjective point of view of M 2 ), while the second correspondence is ⊥ − local for M 2 . Therefore, M 2 is not local for Sig(M 1 ).

Given that only into correspondences can bound the size of interpretations for concepts in module M i , and in order to maintain the notion of locality in these cases, we revisit the definition of ⊥-module for E − SHIQ, considering the restricted signature of such a module, denoted by S * : Subsequently, the module O\M is called a ⊥ − E-local module for S * = S ∪ Sig * (M). In case Sig * (M) = Sig(M), then O\M is ⊥−local. Thus, every ⊥−local (⊥−module) is a ⊥−E-local module (resp. ⊥−E−module), but not viceversa. It must be noticed that the existence of onto correspondences do not harm the completeness and correctness of the reasoning tasks, given that reasoning tasks combine the (subjective) knowledge of all associated modules.

Definition (⊥ − E-module).

The E −SHIQ constructors offer many options for associating different modules, either via link-relations, or via concept correspondences, or via combinations of these: These are also alternative options for the modularization method. Nevertheless, not all combinations are valid for E − SHIQ. For instance, lack of role-to-role correspondences in E − SHIQ, pose the restrictions that a role hierarchy is set in a single module.

The mONTul method

The intuition behind the proposed modularization method is to keep highlydependent ontology terms in the same module, subject to satisfying the constraints for ⊥ − E−local modules w.r.t. their signature. Please recall that the major aim is not to provide locality-preserving modularizations per se, but to compute decompositions that are complete and correct for reasoning tasks w.r.t. locality-preserving constraints, towards enhancing the efficiency of distributed reasoning tasks. Thus, locality constraints drive the decomposition process, reducing the several options that the method may consider for constructing distributed E −SHIQ knowledge bases, preserving the coherency of the subject matters that each module includes. The major steps of the method are as follows:

a) Construction of a dependency graph for specifying the dependencies between concepts and roles, according to the original theory, b) Clustering concepts and roles into groups so as to satisfy the specified constraints (as much as possible) and to decide the signature of modules; and finally, c) The construction of modules and of their associations. Finally, the network of associated modules is further "tuned" so as to eliminate cycles and reduce the diameter of the network by merging small modules. The paragraphs that follow describe each major step of mONTul.

Dependency Graph. Given an ontology O in the SHIQ fragment of Description Logics, the dependency graph for that ontology is a directed graph G = E, V , where V is a set of nodes corresponding to concepts (concept names and complex concepts) and roles, and a set of unidirectional dependency edges E connecting the nodes. Each node for a concept C in the graph is associated with a state variable S C . The state S R of a node corresponding to an ontology role R, comprises two variables S R f rom and S Rto , indicating the module to which R belongs and the module linked via R, respectively. All state variables range to a finite subset of natural numbers D and their values specify whether dependent ontology terms must appear in the signature of the same module or not. The types of edges in the dependency graph are according to the constructors available in the SHIQ fragment of Description Logics: A concept C is related to a role R via an edge of type Q−dep, if C is of the form QR.D, where Q ∈ {∀, ∃, ≤ n, ≥ n}.

[E3:]

A role R is related to a concept C via an edge values − dep, if there is a restriction of the form QR.C, where Q ∈ {∀, ∃, ≤ n, ≥ n}.

[E4:]

A node v R corresponding to a role R can be related to a node v S corresponding to a role S as follows: (a) with a subr − dependency in case R S, or (b) with an inv − dependency in case R is the inverse of S. (c) Transitivity of roles do not impose further dependencies.

Any C ≡ D and disjoint(C, D) axiom can be expressed using subsumption relations. Figure 1 presents the dependency graph for the example ontology.

Edges in the dependency graph are associated with constraints that specify how the states of adjacent nodes can be assigned values from D. These constraints can be distinguished into generic constraints (GC) and locality preserving constraints (LC). While the former suffice for the construction of a E − SHIQ distributed knowledge base, the later encode the rules in formulae (9) and are necessary for computing ⊥ − E−local modules M for the signature Sig(O)\Sig(M). Specifically, the constraints are the following: It must be noticed that GC3 can be omitted. In this case we can specify onto correspondences between concepts in different modules resulting to modules M that are ⊥ − E−local for Sig(O)\Sig(M). Regarding the locality preserving constraints, these aim to construct positive ⊥−concepts for the module M according to formulae (9), for the signature Sig(O)\Sig(M).

[GC1:] If there is a Q − dep between a node v C and a node v R , then it must hold that S C = S Rfrom . [GC2:] If

Regarding specifications of the form QR.C, where Q ∈ {∀, ≤ n}, checking the context where they appear is necessary. Given that we primarily aim at efficient reasoning, in this work we choose to sacrifice locality in cases where this is affected by this type of specification for the efficiency and simplicity of the modularization method. Therefore, to deal with specifications of the form QR.C, where Q ∈ {∀, ≤ n} the method considers the following constraint: This constraint (in contrast to the others) aims to make the concept QR.C with Q ∈ {∀, ≤ n} negative ⊥− concepts for the signature Sig(O)\Sig(M).

Now, given all the constraints specified, we need to point out that not all constraints can be satisfied if they occur in a specific problem instance. Therefore we can distinguish between soft and hard constraints: Given our decomposition purpose, GC constraints (except GC3) are considered hard in any case. Configurations with GC constraints being soft are beyond the scope of this work.

Constraint Satisfaction. During the construction of the dependency graph, the axioms of the ontology are consulted and parsed to detect constituent concepts and roles. For each new concept or role, the method creates a new dependency graph node and it assigns a value to its state variable from D, so as to satisfy the constraints associated to it. This task ends with (probably many) violated hard constraints, which are resolved using a CSP solver, with initial values being those already assigned to the state variables during the construction of the dependency graph. Given the assignments computed by the CSP solver, a hill-climbing algorithm minimizes the number of the remaining violated constraints (if any).

Module Construction. Once the state values of the nodes are decided, dependency graph nodes are clustered into groups. Based on that, the method can decide on the signature of each module. Each group contains those concept names and properties whose state variables (S C for concepts and S R f rom for properties) are equal, and the corresponding nodes are connected via a path in the dependency graph.

At this stage, the method considers additional desiderata concerning the reasoner's performance, i.e. the distribution of axioms, the diameter of the network and the size of cycles of associated modules. For each axiom α in the ontology, the module M that will host this axiom is the one that maximizes the function:

U (M, α) = |Sig(M)|+|Sig(M)∩Sig(α)| |T (M)|+1+|Sig(α)|−|Sig(M)∩Sig(α)| ,

where Sig(M) is the signature of M, T (M) is the set of axioms already in M, and Sig(α) is the set of terms in the axiom α. The intuition is to find the module M that includes most of the terms in Sig(α), subject to the restriction that the number of axioms and correspondences in M (counted in the denominator) will be kept low. Once the module that will host the axiom is decided, the process will construct any correspondences and link-relations to connect terms already in other modules. This happens without affecting the locality of modules, given that no locality constraint has been violated.

Finally, the Node Merging (NM) tuning method reduces the number of modules in the network (which in the general case also reduces the diameter of the network), improving the even distribution of axioms in the modules. Candidate modules to be merged are those (a) with a common neighbor module, and (b) with terms corresponding to dependency graph nodes with the same state. Modules are merged, if the new network will have lower coefficient of variation of the distribution of axioms in the modules. The algorithm iterates until there are no merging actions that can be performed. Merging actions do not violate locality, since, given two ⊥−E−local modules M i , M j , for Sig * (O\M i ) and Sig * (O\M j ) respectively, M i ∪ M j is also a ⊥ − E−local module for Sig * (O\(M i ∪ M j )).

It can be proved that the modularization method computes an E − SHIQ distributed knowledge base Σ, satisfying correctness and completeness of reasoning: by considering each unresolved constraint, and the actions taken to fix the conflict, by associating modules. It follows that the proposed method is a reduction of any SHIQ ontology O onto an equivalent E − SHIQ Σ, and given that the E − SHIQ reasoner is sound and complete [9], the modularization method satisfies correctness and completeness of reasoning.

Experimental Results

To experiment with the proposed modularization method, we have gathered ontologies of various size and expressiveness by crawling web ontology repositories1 . From these ontologies, we present the results concerning 18 ontologies. The corpus is extended with six versions of the Semantic Information System (SIS) registry [11] which are of considerable size and expressiveness.

The 24 ontologies are categorized by their size: small (between 100-499 axioms), medium (between 500-4999 axioms), large (between 5000-9999 axioms) and SIS ontologies (10000 and more axioms). The expressiveness of the language used in all categories varies from ALC to SHIQ. All SIS ontologies are within the ALCHIQ fragment of Description Logics.

In a previous work [10] we have evaluated montul evaluated against the locality based module extraction algorithm reported in [3], and the results have shown that mONTul computes smaller and less modules. In this paper, the experimental cases concern specific configurations for each ontology. Each configuration depends on the sets of constraints used (and their distinction as hard or soft), the size of the state variables domain D varying in the set {2, 10, 20, 40}, and the use of the network tuning method. We distinguish between the configuration of type (A), where constraints GC1,GC2,GC4,LC1,LC2,LC3 are hard and the N M tuning method is used; of type (B) where GC1,GC2,GC4 are hard, LC1,LC2,LC3,LC4 are soft, and the N M tuning methods is used, and finally; type (C) which is the same as (B) but without the tuning method. Each configuration is denoted by a template of the form "CaseType" "number of states". E.g. "A 20" denotes type A with |D|=20. Ontologies are ordered in the x axis by the (ascending) number of modules produced by the A 2 configuration. Succinctly, configurations A and B compute decompositions with no significant differences in all cases, although we may notice that configurations A present a larger number of cycles in some experimental cases, independently from the size of D. Configurations of type C, compared to A and B, compute decompositions with larger number of modules/cycles, and uneven distribution of axioms in modules (figures indicate the coefficient of variation of the distribution). This shows the necessity for tuning the decompositions constructed. Furthermore, as Figure 3 shows, while the size of D increases, the results get worst w.r.t. our requirements: This is due to the fact that larger D provides more degrees of freedom for the decomposition, thus more modules, greater diameter and more cycles in the network of associated modules.

Concluding Remarks

Given an ontology within the SHIQ fragment of Description Logics, this paper proposes the mONTul method for partitioning this ontology into an arbitrary number of modules that (a) form a distributed E − SHIQ knowledge base, such that (b) it can be used for correct and complete reasoning, and (c) each module preserves to a great extent the meaning of the terms in its signature. Future work concerns further investigation on the use of different sets of (hard and soft) constraints and methods for tuning the network of associated modules.

7) where A + is a concept name and R + a role in Sig(O)\S, C ∈ Sig(O) and R ∈ Sig(O). It must be noticed that given that S ⊆ Sig(O), a concept or role in Sig(O) may not be in C + S or in C − S . The negative ⊥-concepts C − S for S are as follows: C − S ::

Arole inclusion axiom R + R or a transitivity axiom T rans(R + ) is local w.r.t. S. A GCI is local w.r.t. S if it is either of the form C + S C or of the form C C − S . A module M of an ontology is ⊥−local for a signature S, iff all its axioms are local for S ∪ Sig(M).

Let O be an ontology and let S be a signature, S ⊆ Sig(O). M ⊆ O is a ⊥ − E-module for O for S, if O\M is ⊥-local for the restricted signature S * = S ∪ Sig * (M), where Sig * (M) = Sig(M)-{C| C is a concept name and there exists a correspondence of the form C → C from the subjective point of view of O\M}.

Figure 1 .1Figure 1. The dependency graph for the example ontology

[E1:] Two nodes v C and v D representing concepts C and D can be connected: (a) With an edge of type subc − dep if C D, (b) with an edge of type cap − dep, if C is of the form i D i , i = 1, 2... and there is a D k , s.t. D = D k (the equality specifies equality between the terms lexicalizing the concepts), (c) with an edge of type cup − dep, if C is of the form i D i , i = 1, 2... and there is a D k , s.t. D = D k , (d) with an edge of type compl, in case C and D are concept names and C is of the form (¬D), or [E2:]

there is a values − dep between a node v R and a node v C , then it must hold that S C = S Rto [GC3:] If there is a subc − dep between nodes v C and v D , then it must hold that S C = S D . [GC4:] If there is a subr − dep or inv − dep between nodes v R and v S , then it must hold that S Sfrom = S Rfrom and S Sto = S Rto [LC1:] If there is a cap − dep between nodes v C and v D , and D is a complex concept, then S C = S D . If D is a concept name, then it must hold that S C = S D . [LC2:] If there is a cup − dep between nodes v C and v D , then it must hold that S C = S D . [LC3:] If there is an edge Q-dep between C and R, where Q ∈ {∃, ≥ n} and there is no Q-dep between C' and R where Q ∈ {∀, ≤ n}, then it must hold that S Rfrom = S Rto .

[LC4:]If there is an edge Q − dep between C and R, where Q ∈ {∀, ≤ n, }, then it may hold that S Rfrom = S Rto , even if there is a Q-dep between C' and R, where Q ∈ {∃, ≥ n}.

Figures 2 and 3 show the results of different mONTul configurations w.r.t. our desiderata for (a) small diameter of the network of modules, (b) small number of cycles of associated modules, (c) balanced distribution of axioms in the modules. Due to space restrictions, Figure 2 presents results only for |D| = 2, while comparative results for larger D are shown only for the A configuration in Figure 3: These are representative for the results reported from configurations B and C.

Figure 2 .2Figure 2. Comparative results for A/B/C 2 configurations and for all networks: (a) Number of modules, (b) Network diameter, (c) Number of cycles, (d) Distribution of axioms.

Figure 3 .3Figure 3. Comparative results for (a) number of modules, (b) network diameter, (c) number of cycles, (d) distribution of axioms for all networks and A |D| configurations. (June 25th, 2013), http://bioportal.bioontology.org/, http://www.cs.ox.ac.uk/isg/ontologies/ and http://owl.cs.manchester.ac.uk/owlcorpus

A Logical Framework for Modularity of Ontologies CuencaBernardo IanGrau YevgenyHorrocks UlrikeKazakov Sattler proc. of IJCAI 2007 of IJCAI 2007 Just the right amount: extracting modules from ontologies CuencaBernardo IanGrau YevgenyHorrocks UlrikeKazakov Sattler Proc. of WWW ' of WWW ' 2007 07 Modular reuse of ontologies: Theory and practice CuencaBernardo IanGrau YevgenyHorrocks UlrikeKazakov Sattler JAIR 31 1 2008 Logical Difference and Module Extraction with CEX and MEX, Description Logics BorisKonev CarstenLutz WaltherDirk FrankWolter CEUR Workshop Proc 2008 353 Conservative Extensions in Expressive Description Logics CarstenLutz DirkWalther FrankWolter Proc. of IJCAI 2007 of IJCAI 2007 Survey of modular ontology techniques and their applications in the biomedical domain JyotishmanPathak ThomasMJohnson ChristopherGChute Integr. Comput.-Aided Eng., IOS Press 16 3 2009 Web ontology segmentation: analysis, classification and use JulianSeidenberg AlanRector Proc. of the 15th Int. Conf. on World Wide Web of the 15th Int. Conf. on World Wide Web ACM Press 2006 Structure-Based Partitioning of Large Ontologies HeinerStuckenschmidt AnneSchlicht Modular Ontologies 2009 5445 GeorgeAVouros MGeorgios cs.AI/1310.2493 Santipantakis: Combining Ontologies with Correspondences and Link Relations: The E-SHIQ Representation Framework 2013 Modularizing Ontologies for the Construction of E − SHIQ Distributed Knowledge Bases MGeorgios GeorgeASantipantakis Vouros SETN 2014 A semantic information system for services and traded resources in Grid e-markets GeorgeAVouros FGCS 26 7 2010