<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computing OWL Ontology Decompositions Using Resolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Schiaffino</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Achille Fokoue</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aditya Kalyanpur</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aaron Kershenbaum</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Li Ma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chintan Patel</string-name>
          <email>chintan.patel@dbmi.columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edith Schonberg</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kavitha Srinivas</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Columbia University Medical Center</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM China Research Lab</institution>
          ,
          <addr-line>Beijing 100094</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>IBM Watson Research Center</institution>
          ,
          <addr-line>P.O.Box 704, Yorktown Heights, NY 10598, USA achille, aaronk, ediths</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Iona College</institution>
          ,
          <addr-line>New Rochelle, NY 10000</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Reasoning over large ontologies can be done more effectively if they can be decomposed into smaller parts which can be reasoned on independently. This requires identifying parts of the ontology relevant to the problem at hand. We present a novel algorithm, based on resolution calculus, for decomposing an OWL ontology into smaller, more manageable components, such that the union of reasoning over each of these components separately is the same as reasoning over the original ontology. We describe our computational experience using the algorithm, and demonstrate that it is indeed possible to efficiently solve the standard concept subsumption reasoning problem in four large real-world OWL ontologies: SNOMED-CT, NCI, SWEET-JPL and GALEN. We chose SNOMED-CT and NCI because of their size; GALEN because it is highly interconnected; SWEET-JPL because it is expressive (containing negation, both existential and universal quantifiers and both intersections and unions).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ontologies that model domains in the real world are often very large, pushing
the limits of existing reasoners. For example, SNOMED CT5, has over 300,000
concepts and approximately the same number of axioms. To reduce the cost
of reasoning over large ontologies, we propose a sound and complete technique
which allows us to work only with axioms and concepts relevant to the current
task. In some cases, this may also involve merging results obtained from parts
of a decomposed ontology.</p>
      <p>In this paper, we focus on OWL ontologies, specifically the TBox which
contains terminological assertions about concepts and the RBox which contains
assertions about roles and role hierarchies. Furthermore, we focus on the standard</p>
    </sec>
    <sec id="sec-2">
      <title>5 http://www.snomed.org/snomedct/index.html</title>
      <p>reasoning task of computing the concept subsumption hierarchy. Thus, our goal
is to decompose the original OWL ontology into smaller components (that are
not necessarily disjoint), such that performing subsumption reasoning on each
of the components separately and aggregating the results produces the same
subsumption hierarchy as reasoning over the entire ontology.</p>
      <p>The approach we take is based on resolution calculus. Resolution is a
standard, sound and refutation-complete6 inference procedure for Propositional and
First Order Logic. This allows our decomposition strategy to be applicable to any
fragment of FOL, such as SHOIN , which is the description logic that OWL-DL
corresponds to. However, in this paper, we restrict our attention to SHIN .</p>
      <p>
        Before performing reasoning, we use resolution to detect which axioms in
the KB are relevant to infer a particular subsumption entailment, and form
components based on these axioms. The approach is conservative in that it may
produce non-minimal components, i.e., the components may include additional
axioms that are not relevant to the subsumption entailment under consideration.
However, in our experiments, we have observed that we still obtain substantial
gains in runtime and memory by doing decomposition. Next we perform the
actual reasoning task on each component, using Pellet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a sound and complete
DL-optimized reasoner.
      </p>
      <p>To summarize, the contributions of this paper are:
1. We provide a theoretical framework for ontology decomposition by defining
the notion of an inconsistency-preserving and subclass-preserving sub-KB,
i.e., fragments of a KB that preserve key logical entailments.
2. We present a novel algorithm based on logical resolution to compute such
fragments for OWL ontologies.
3. We demonstrate the practical significance of this algorithm for decomposing
OWL TBoxes and RBoxes for the purpose of subsumption reasoning on
four large real-world OWL ontologies, SNOMED-CT, NCI, SWEET-JPL
and GALEN,and show that reasoning time and memory are significantly
reduced.
2</p>
      <sec id="sec-2-1">
        <title>Approach</title>
        <p>We first focus on the general problem that forms the basis of our ontology
decomposition technique: given a DL knowledge base which may be unsatisfiable,
find a fragment of this KB that contains sufficient axioms to prove its
inconsistency. The motivation here is that KB consistency checking is a key entailment
problem in OWL-DL since all standard reasoning tasks can be reduced to it.
Thus, the goal of decomposing a KB when checking a specific entailment (such
as concept subsumption) is to produce relatively smaller fragments of the KB
that are sufficient to prove the entailment. We first apply our approach to
ontologies containing only Horn clauses (i.e., clauses containing at most one of
positive literal) and then extend it to non-Horn clauses.
6 i.e. for an unsatisfiable set of FOL clauses, there is a derivation of the empty clause
2.1</p>
        <sec id="sec-2-1-1">
          <title>Extracting an Inconsistency-Preserving sub-KB</title>
          <p>Definition 1. Given a knowledge base K, an inconsistency-preserving sub-KB
K0 is a subset of K that satisfies the following property K |= ⊥ iff K0 |= ⊥.</p>
          <p>Note that the above definition does not make any guarantees about the size
or the minimality of K0. However, in practice, we would like our algorithm to
produce a K0 that is relatively small compared to the original KB.</p>
          <p>Our approach is based on FOL resolution. Since SHIN is a subset of FOL,
it is possible to rewrite a SHIN KB, say K1, as a set of FOL clauses K2. If K1
is unsatisfiable, then K2 is obviously unsatisfiable as well, and in particular, an
FOL-based resolution procedure when applied to K2 must produce the empty
clause. Thus, a precise solution to determining an inconsistency-preserving
subKB is to apply FOL resolution to K2 and determine which clauses in K2 play a
role in generating the empty clause.</p>
          <p>However, our approach does not require that we actually do the complete
resolution. Instead, we compute a conservative approximation, guaranteeing that
we will never fail to recognize when the empty clause can be derived. We may,
however, sometimes conclude that it can be derived when in fact it cannot. In
practice, this turns out to be a good tradeoff as our approach is much faster than
doing a full resolution, and in our experiments, we have seen that the fragments
we extract are still small enough to allow us to obtain significant reductions
in reasoning time. Having obtained a decomposition using this conservative
approximation, we then perform sound and complete DL reasoning.</p>
          <p>Our conservative approach is based on abstracting the FOL KB into
Propositional Logic (PL), i.e., we get rid of all the variables and constants in the
clauses, and treat each of the predicates as propositions instead. Thus, this
abstraction ignores equalities as well as individuals. The approach is described in
Table 3. Some of these transformations necessitate restoring the expression to
CNF, which we do. Below is an example of how a DL axiom is converted into
an equivalent FOL clause and then abstracted into PL:</p>
          <p>(DL) C v ∀R.D → (FOL) ∀x, y, ¬C(x) ∨ ¬R(x, y) ∨ D(y) * (PL) ¬C ∨
¬R ∨ D</p>
          <p>Next, we apply unit-resolution to the resulting approximate Horn
propositional KB. Unit resolution is a sound and complete resolution procedure for
Horn-clause KBs, where each resolution step must include at least one unit
clause, i.e., a clause containing a single literal. If applying unit-resolution over
the PL KB derives an empty clause, we record the complete set of
responsible clauses, and obtain the corresponding DL axioms that they were abstracted
from.</p>
          <p>However, there is an important caveat here. Since we are conservative in our
approximation of the KB, it is possible that unit resolution produces an empty
clause when their corresponding DL axioms are not inconsistent. In such a case,
the output of our algorithm is not an inconsistency-preserving sub-KB. In
order for our solution to be complete, we need to modify unit resolution to find
all possible derivations of the empty clause. In practice, the performance of the
algorithm is not affected much by the modification, since typically there are a
small number of derivations of the empty clause, and unit resolution has
polynomial complexity and is fast in practice. An outline of our Approx-Resolution
algorithm is given in Table 1.</p>
          <p>Algorithm: Approx-Resolution
Input: Horn − SHIN KB K1
Output: Horn − SHIN KB K2
1. Convert K1 into an equivalent (unsatisfiability preserving) set of FOL clauses K1F OL as shown in Table 2
2. Abstract the FOL KB K1F OL into a propositional logic KB K1P L by removing
all variables/constants from the clauses in K1F OL</p>
          <p>Let K1P L−Horn be corresponding pure Horn KB formed.
3. (This step is used only in the extension to non-Horn clauses as described in Section 2.4)
Replace each non-Horn clause in K1P L of the form (¬P1 ∨ .. ∨ ¬Pn) ∨ (Q1 ∨ .. ∨ Qm),</p>
          <p>with ‘m’ new clauses of the form (¬P1 ∨ .. ∨ ¬Pn) ∨ Qi (1 ≤ i ≤ m).
4. Perform unit resolution on K1P L−Horn until no more empty clauses can be derived</p>
          <p>K2 ← K2 ∪ β
return K2
set K2 ← ∅
for each empty clause derived,
let (K1P L−Horn)⊥ ⊆ K1P L−Horn denote clauses responsible for deriving ⊥
for each clause α ∈ (K1P L−Horn)⊥,
β ← DL axiom in K1 that α was abstracted from</p>
          <p>The following theorem, which is stated without proof, shows that this
algorithm produces a valid inconsistency-preserving sub-KB. The interested reader
is referred to our Tech Report7 for details of the proof.
7 http://domino.research.ibm.com/comm/research projects.nsf/pages/iaa.index.html/$FILE/techReportDecomp.pdf.
Theorem 1. Let K1 be a Horn − SHIN KB (reference!!!) and K2
=ApproxResolution(K1). Then, K2 is an inconsistency-preserving sub-KB of K1, i.e.,
K2 ⊆ K1 and K1 |= ⊥ iff K2 |= ⊥.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Using Approx-Resolution in Ontology Decomposition</title>
          <p>We describe how the Approx-Resolution algorithm is used to compute ontology
decompositions for the purpose of subsumption reasoning. Again, our goal is
to decompose the original ontology into a set of smaller components, perform
subsumption reasoning independently over each of the components, and merge
the results at the end to obtain the complete subsumption hierarchy.
Definition 2. Let K be a consistent KB containing named concepts C1, ..Cn.
A subclass-preserving sub-KB Ksub(Ci) w.r.t. a named concept Ci is a subset of
K that satisfies the following property: K |= (Ci v Cj ) 1 ≤ j ≤ n, j 6= i, iff
Ksub(Ci) |= (Ci v Cj )</p>
          <p>Just as in Section 2.1, we would like to produce sub-KBs that are as small as
possible. Based on the above definition, we can see that if we have an efficient
technique to compute Ksub(Ci), we can obtain and reason over a smaller fragment
sufficient to compute all Ci’s subsumers. We can then repeat the process for
all named concepts {C1, ..Cn} in the KB, and derive the entire subsumption
hierarchy.</p>
          <p>We now describe an approach to compute Ksub(Ci). Note that in general,
we can prove (by refutation) that a concept Cj subsumes another concept Ci in
a consistent SHIN KB K iff the complex concept (Ci u ¬Cj ) is unsatisfiable
in K, i.e., if we construct K0 ← K ∪ (Ci u ¬Cj )(a) |= ⊥, where a is a new
individual not present in K, then K0 is inconsistent. This principle gives us a
direct solution to finding Ksub(Ci) using the Approx-Resolution algorithm as
shown in the description of the Decomposition Algorithm (3):
Theorem 2. Let K1 be a Horn−SHIN KB containing named concepts C1, ..Cn
and let K2 ← Decompose(K1, Ci) for some i, 1 ≤ i ≤ n. Then, K2 is a
subclasspreserving sub-KB of K1 w.r.t Ci
Proof. Consider any one KB Kj constructed in step 1 of the algorithm. Let
Kj0 ← Approx-Resolution(Kj ). By Theorem 1, Kj0 is an inconsistency-preserving
sub-KB of Kj , and thus Kj |= Ci v Cj iff Kj0 |= ⊥, i.e., Kj0 can be used to check
whether Cj is a subsumer of Ci. K2, constructed in step 3, simply represents the
sum union of the output of Approx-Resolution for each Kj (1 ≤ j ≤ n), and thus
for any j, Kj |= Ci v Cj iff K2 |= ⊥. This implies that K2 is a subclass-preserving
sub-KB of K1 w.r.t Ci.</p>
          <p>Now, we do not have to actually construct all (n-1) KBs in order to compute
Ksub(Ci). A more efficient way to solve this problem is given in Table 4.</p>
          <p>In the optimized version of the decomposition algorithm the concept ‘X’ plays
the role of a ‘variable’ that can take any named concept value, and hence a single
run of Approx-Resolution is sufficient to find all named concept subsumers of Ci.</p>
          <p>Controlling Number of Components in Ontology Decomposition
The decomposition algorithms presented so far satisfy our key goal of producing
a smaller version of the KB on which the subsumption test can be performed.
However, to compute the entire subsumption hierarchy, subsumption reasoning
must be performed on all the components. For large ontologies, it is not practical
to decompose the ontology for each concept. Thus, an important property of our
decomposition is to ensure that the number of components produced by our
decomposition is not too large. The following Lemma implies that if we consider
the asserted ontology class hierarchy as a graph, we only need to build
subclasspreserving sub-KBs for the leaf nodes of this graph.</p>
          <p>Lemma 1. Let K be a consistent KB containing named concepts C1, ..Cn, and
suppose K |= Ci v Cj . Then, Optimized-Decompose(K, Cj ) ⊆
OptimizedDecompose(K, Ci).</p>
          <p>Thus, the entire inferred class hierarchy can be computed bottom-up using
only the decompositions for the leaf node concepts. Moreover, if an ontology has
numerous leaf nodes, we can merge components for ‘closely related’ leaf-nodes.
One possibility is to perform a standard stack-based depth-first search over the
entire asserted class hierarchy, record the time at which a concept leaves the
stack, and consider concepts with short time-stamp gaps to be closely related.
This is the approach we use in our implementation discussed in the next section.
2.4</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Extension to Non-Horn Clauses</title>
          <p>In this section, we establish that Approx-Resolution, which computes an
inconsistencypreserving sub-KB and whose correctness has already been proven for
HornSHIN KBs, can be extended to SHIN KBs that are not Horn. This result
relies in part on the transformation performed by adding Step 3 to
ApproxResolution. This step approximates a non-Horn clause by a set of Horn clauses.
From the correctness of Approx-Resolution on any SHIN KB, it directly
follows that Decompose and Optimized-Decompose can also be correctly applied
to any SHIN KB (even non-Horn-SHIN KB). The validity of this extension
is given by the theorem stated below without proof. The interested reader is
referred to our Tech Report8.</p>
          <p>Theorem 3. Given a TBox T that can be described by a CNF expression E, in
general including both Horn and non-Horn clauses, and containing an
inconsistency. Let E’ the expression obtained from E by applying the third step of the
Approx-Resolution Algorithm to non-Horn clauses. E’ is comprised entirely of
Horn clauses. Furthermore, the union U’ of the unit resolutions of the empty
clause from E’ is such that, for each minimal derivation D of the empty clause
in E and for each clause C in D, there is a clause C’ used in U’ such that either
C’= C or C’ is one of the Horn-clauses derived from C by applying Step 3 of
Approx-Resolution to C.
2.5</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Example</title>
          <p>We demonstrate how our algorithm works with an example. Consider a SHIN
KB K containing atomic concepts A − G, atomic roles R, S and the 6 axioms
given below:
1 : A v B u ∀R.C u D
4 : D v≥ 1.S
2 : E t F v B
5 : ∃R.C u B v G
3 : F v A t G
6 : S v R
Our goal is to find all atomic concept subsumers of A.</p>
          <p>We first convert K into an equivalent FOL KB KF OL (see Table 5), where the
DL axioms have been replaced by equivalent FOL clauses (shown in the second
column).</p>
          <p>In the table, each FOL clause has an implied universal quantification over
each of the variables in it. For example, consider the part A v ...∀R.C.. in
the DL axiom 1. Here, the universal restriction in the RHS translates into the
8 http://domino.research.ibm.com/comm/research projects.nsf/pages/iaa.index.html/$FILE/techReportDecomp.pdf
implication ∀x, ∀y : R(x, y) → C(y), and in combination with A on the LHS of
the axiom produces the disjunctive FOL clause ∀x, ∀y : ¬A(x) ∨ ¬R(x, y) ∨ C(y).</p>
          <p>Note that the disjunction in the RHS of axiom 3 and the conjunction in the
LHS of axiom 5 produces non-Horn FOL clauses for both these cases. Also, the
presence of the min-cardinality restriction on role S in axiom 4 gives rise to two
skolem functions f 1(x), f 2(x) for creating at least two distinct S−successors of
x.</p>
          <p>After creating KF OL, we approximate it into a propositional KB KP L (shown
in the third column of Table 5), by ignoring all variables and constants appearing
in the FOL clauses. Finally, the non-Horn PL clauses in KP L are approximated
further into Horn clauses by treating the disjunction of positive literals as a
conjunction, and thus splitting the clauses further as shown in the last column
of Table 5, producing KB KP L−Horn.</p>
          <p>Now, our goal is to find A v X, where X stands for any atomic concept
(except A) in K. Thus, we follow the standard proof-by-refutation procedure of
adding A ∧ ¬X to KP L−Horn (where X acts as a ‘variable’ for other atomic
concepts), and finding all inconsistencies due to this addition. That is, we perform
unit resolution over the modified Horn KB to find all derivations of ⊥.</p>
          <p>Table 6 shows the sets of clauses that resolve to ⊥, and the corresponding
literal that resolves with ¬X in each derivation. Note that the superscript on
each PL clause depicts the original DL axiom it was derived from.</p>
          <p>Thus, the algorithm finds potential concept subsumers of A: B, D, C, G (since
R, S are actually atomic roles in K) and the axioms necessary to derive
accurate subsumers of A: {1, 4, 5, 6} (Note: axioms 2, 3 are excluded). These four
axioms are then fed to a sound and complete tableau reasoner to derive the real
subsumers of A: B, D, G.
3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Computational Experience</title>
        <p>We performed experiments on an Intel Pentium-M 1.6GHz processor with 2GB
RAM running Windows XP on the following OWL-DL ontologies:</p>
        <p>Ontology
SNOMED CT</p>
        <p>NCI</p>
        <p>Galen
SWEET-JPL
#concepts (NC) #axioms (NA) expressivity
379638 379640 Horn − ALC
59898 86574 Horn − ALC
2749 4353 Horn − SHF
1639 1807 N on − Horn − SHIOF (D)</p>
        <p>Table 7. Ontologies Decomposed</p>
        <p>SNOMED CT and NCI9 are large and popular OWL ontologies in the medical
domain. Galen10 and NASA’s SWEET-JPL ontology11 use almost all of
OWLDL expressivity. Since we are not dealing with ABoxes, we modified SWEET-JPL
by converting the 56 hasValue constraints to someValuesFrom constraints and
all instances of types other than Class to type Class; this increases the size of
the ontology slightly.</p>
        <p>We define two metrics α and β, which measure how well we have decomposed
the and the overhead incurred by decomposing respectively. Specifically, α = NL
/ NA and β = ND / NC, where N A is number of axioms in the ontology, N C
is the number of named concepts, N L is the number of axioms in the largest
component after decomposition, and N D is the total number of concepts in all
the components. Ideally, we would like both α and β to be as small as possible,
although we expect a tradeoff: if we increase the granularity of our decomposition
so that individual components contain fewer axioms (causing α to decrease), we
expect the total number of classes across all components to increase (forcing β
to increase), resulting in more subsumption tests on smaller sized KBs to obtain
the entire class subsumption hierarchy.</p>
        <p>The results of the resolution-based decomposition algorithm Optimize-Decompose
on SNOMED-CT are summarized in the Table 8. We used our algorithm using</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>9 http://www.mindswap.org/2003/CancerOntology/nciOncology.owl 10 http://www.cs.man.ac.uk/ horrocks/OWL/Ontologies/galen.owl 11 http://sweet.jpl.nasa.gov/ontology/</title>
      <p>depth frst search traversal, and in particular, the number of leaves, to vary the
number of components produced by our decomposition, in order to examine the
tradeoff between α and β.</p>
      <p>Table 8 (Row 1) shows that by using a leaf-width of 10000, SNOMED CT
can be decomposed into 76 components, the largest of which is around 10% the
size of the total ontology (α = 0.1)and only slightly more than doubling the
total number of concepts (β = 2.3). This is a very attractive tradeoff, since the
order of complexity of reasoning algorithms grows non-linearly with the size of
the problem, we should expect a significant reduction in the overall effort. Also,
the total runtime of the algorithm is between 3-4 minutes.</p>
      <p>
        To determine the quality of the components created by our decomposition,
we compared our components with top-level semantic biomedical categories. To
obtain top-level categories, we mapped SNOMED CT to UMLS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is a
meta-thesaurus for the medical domain and specifies 14 top level biomedical
categories. SNOMED-CT most frequently uses the semantic categories: Disorders,
Procedure, Chemicals and Drugs, Living Beings/Organisms, and Anatomy. We
analyzed the SNOMED-CT decomposition with 25 components to determine
the coverage of these top 5 categories across different components. The graph in
Figure 1 shows the distribution of the categories.
      </p>
      <p>The semantic categories are cleanly distributed across different components,
for example, Living Being appears only in components 1, 2, 3 and 21.
Furthermore, a set of categories never appear together in a single component, for
example Disorder and Chemicals are independent of one another. This has significant
implications for biomedical applications that only deal with specific subset of
concepts in SNOMED-CT, for example Laboratory data dealing with a specific
domain such as Living Beings, and that only cares about concept subsumption
information, which is typically the case, can use the components 1,2,3 and 21
only instead of the entire SNOMED CT. The result is also interesting from
perspective that the partitioning of SNOMED CT was performed based on a logic
procedure that resulted in domain-based partitions.</p>
      <p>We then did experiments to determine the effects of decomposition and
validate our implementation by checking that the class hierarchy remained
unchanged when the ontology was decomposed. Also, we wanted to examine the
improvements in running time and memory requirements obtainable via
decomposition.</p>
      <p>Table 9 shows the results of these experiments on NCI, GALEN, and
SWEETJPL; SNOMED-CT was not included because memory limitations prevented us
from running Pellet on it on our laptop computer. Also, for the same reason,
when considering NCI-THESAURUS, we removed leaf concepts (i.e. concepts
only involved in subsumption relationships and with no subsumees except other
leaf concepts). Times for T-decomp and T-reason in Table 9 are given in seconds.
As can be seen from examining Table 9, in all cases, the time to decompose the
ontology was roughly an order of magnitude smaller than the time to reason
over it. Also, in all cases, there was a significant reduction in reasoning time, as
measured by comparing the times for #comp = 1 (not decomposed) with the
times for #comp &gt; 1 (decomposed). In the case of JPL, the reduction was over
an order of magnitude. In the case of NCI, the reasoner was unable to compute
the derived class hierarchy at all within a 12 hour time period after which we
stopped the experiment.</p>
      <p>
        To compare the sizes of the decomposition components we obtained with
those obtained by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we formed the closures of each of the concepts in JPL,
GALEN and NCI-ONCOLOGY. The distribution of component sizes matched
closely with those in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These results in Table 10 are not surprising as the
rules used in both approaches are very similar when applied to these ontologies.
The average sizes of the components for each of the three ontologies were also
very similar, with those obtained by our approach being roughly 10% smaller on
average.
The problem of decomposing OWL ontologies has received a lot of attention in
recent years. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] describes an algorithm for partitioning an ontology by treating
the ontology as a graph, and examining dependencies between classes that appear
as nodes in the graph based on certain heuristics. On a similar note, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
provides techniques to extract ontology modules based on structural analysis
of the asserted ontology graph (i.e., traversing related classes and properties
through term definitions), and more recently, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses structural analysis to get
rid of irrelevant axioms fed to an FOL Prover (Vampire) to perform standard DL
reasoning tasks such as classification. The main problem with these approaches
is that either they do not provide strong logical guarantees about the ‘module’
computed, or in cases where they do, the modules are larger than necessary due
to the conservative nature of structural analysis. For example, in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], all General
Concept Inclusion (GCI) axioms in the ontology are considered relevant for a
concept subsumption test, whereas our resolution-based analysis helps prune out
irrelevant GCIs as well.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presents a formal definition of module for an atomic concept that is more
general than ours, e.g., it entails all atomic subsumers as well as subsumees
of the concept. This causes the modules created using their approach to be
much larger than what we need. Also, the solution produces a strict partitioning
of the ontology, whereas we allow for possibly overlapping components for our
decomposition task.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] defines modules for concepts that do not preserve all its atomic superclass
entailments, and also uses inferences computed by a reasoner in order to build
a module. For both these reasons, it does not fit in with our scenario.
      </p>
      <p>
        More recently, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] describes a method for extracting modules based on the
notion of conservative extension of the ontology. The authors provide a definition
of an S-module (where S is a signature) of an ontology w.r.t a language L, show
its relation to model and deductive conservative extensions of ontologies, and also
describe an elegant technique to find syntactical locality-based S-module in the
ontology. Proposition 9 in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is directly related to our work, since it states that
locality-based modules preserve complete information to find all superclasses of
an atomic concept and thus can be used to optimize classification. However,
these modules preserve much more information (all entailments involving the
concept) than what is needed for classification, and thus may be larger. Also,
since we focus only on the subsumption entailment in a single direction, we can
make use of Lemma 1 to speed up the decomposition process by building Ksub
for leaf concepts alone, a property that does not hold for locality-based modules
defined in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which requires computing modules for all atomic concepts in the
ontology. Finally, we note that since our definition of Ksub only preserves subclass
entailments, it is weaker than the general notion of conservative extension, i.e.,
K is not necessarily a conservative extension of Ksub(C) (for any concept C).
5
      </p>
      <sec id="sec-3-1">
        <title>Conclusions and Future Work</title>
        <p>We have presented a novel resolution-based algorithm for OWL ontology
decomposition, applied to the problem of determining the implied class hierarchy.
For the medical OWL ontology SNOMED CT, the resulting components are
approximately an order of magnitude smaller than the original ontology, while
the total number of concepts in all components of the decomposition is only a
small multiple of the number of concepts in the original ontology. Similar
positive results were seen in the decomposition of the more expressive JPL and NCI
ontologies.</p>
        <p>
          In the future, we would like to test the effects of decomposition using other
reasoners (e.g. RACER [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]) and to explore the additional gains obtainable from
incorporating decomposition directly into the reasoner itself.
        </p>
        <p>We would also like to extend our solution to include ABoxes as well, and apply
it to ABox reasoning tasks such as query answering. Applying our technique to
an ABox is a challenge, since all concept-membership assertions C(a) (resp. role
assertions R(a, b)) in the ABox are abstracted to the proposition C (resp. R).
This has the effect that if the proposition C (resp. R) is considered relevant to
generate the empty clause under unit resolution over the propositional KB, then
all concept-membership assertions involving C (resp. role assertions involving
R) will be included in the output KB. Since typically, the ABox is many orders
of magnitude larger than the TBox/RBox, this approach is clearly not feasible.</p>
        <p>
          To overcome this challenge, we plan to explore combining the TBox/RBox
decomposition algorithm (Approx-Resolution) with our earlier ABox
summarization technique [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which dramatically reduces the size of an ABox needed for
reasoning. A summary ABox A0 is an abstraction of the original ABox A, such
that if A0 is consistent, then A must also be consistent, and thus we can often
reason on A0 instead of A12. Our goal is to try to apply Approx-Resolution to
the KB K0 ← T ∪ R ∪ A0.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Pellet: An owl dl reasoner</article-title>
          .
          <source>In: Description Logics</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>DA</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The unified medical language system</article-title>
          .
          <source>Methods Inf Med</source>
          .
          <volume>32</volume>
          (
          <issue>4</issue>
          ) (
          <year>1993</year>
          )
          <fpage>281</fpage>
          -
          <lpage>291</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Just the right amount: Extracting modules from ontologies</article-title>
          .
          <source>In: WWW</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Klein.,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Structure-based partitioning of large class hierarchies</article-title>
          .
          <source>In: ISWC</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The prompt suite: Interactive tools for ontology merging and mapping</article-title>
          . In: International Journal of Human-Computer Studies. (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Seidenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Rector.,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Web ontology segmentation: Analysis, classification and use</article-title>
          . In: WWW. (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tsarkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riazanov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Using Vampire to reason with OWL</article-title>
          . In
          <string-name>
            <surname>McIlraith</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plexousakis</surname>
          </string-name>
          , D., van
          <string-name>
            <surname>Harmelen</surname>
          </string-name>
          , F., eds.
          <source>: Proc. of the 3rd International Semantic Web Conference (ISWC</source>
          <year>2004</year>
          ).
          <source>Number 3298 in Lecture Notes in Computer Science</source>
          , Springer (
          <year>2004</year>
          )
          <fpage>471</fpage>
          -
          <lpage>485</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic partitioning of owl ontologies using e-connections</article-title>
          .
          <source>In: Description Logics</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mathieu d'Aquin</surname>
            ,
            <given-names>Marta</given-names>
          </string-name>
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Modularization: a key for the dynamic selection of relevant knowledge components</article-title>
          . In: First International Workshop on Modular Ontologies, ISWC. (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Haarslev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moller</surname>
          </string-name>
          , R.:
          <article-title>Racer system description</article-title>
          .
          <source>Conf. on Automated Reasoning (IJCAR)</source>
          (
          <year>2001</year>
          )
          <fpage>701</fpage>
          -
          <lpage>705</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kershenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          , E.Schonberg,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Srinivas: The summary abox:cutting ontologies down to size</article-title>
          .
          <source>Proc. of the Int. Semantic Web Conf. (ISWC)</source>
          (
          <year>2006</year>
          )
          <fpage>136</fpage>
          -
          <lpage>145</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>12</surname>
          </string-name>
          <article-title>The converse, however, is not true and if A0 is inconsistent, we use a refinement process which involves making A0 more precise w</article-title>
          .r.
          <article-title>t the original A, in order to confirm whether A is indeed inconsistent</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>