=Paper= {{Paper |id=Vol-350/paper-5 |storemode=property |title=Distributed Resolution for ALC - First Results |pdfUrl=https://ceur-ws.org/Vol-350/paper5.pdf |volume=Vol-350 }} ==Distributed Resolution for ALC - First Results== https://ceur-ws.org/Vol-350/paper5.pdf
                  Distributed Resolution for ALC -
                            First Results

                          Anne Schlicht, Heiner Stuckenschmidt

                             University of Mannheim, Germany
                         {anne,heiner}@informatik.uni-mannheim.de



       Abstract. The use of description logic as the basis for semantic web languages
       has led to new requirements with respect to scalable and non-standard reasoning.
       Description logic is a decidable fragment of FOL but still, the standard reason-
       ing tasks are of exponential complexity, satisfiability and subsumption tests are
       often intractable on large ontologies. Existing large ontologies have a modular
       structure like networks of linked ontologies, caused by the development process.
       However, current reasoning approaches do scarcely take advantage of this struc-
       ture. The available reasoners do not exploit parallel computation and scalability
       improvements enabled by distributed reasoning.
       In this paper, we lay the foundation for developing distributed reasoning methods
       by showing that the description logic fragment ALC can be distributed. We pro-
       pose a distributed, complete and terminating algorithm that decides satisfiability
       of terminologies in ALC. The algorithm is based on recent results on applying
       resolution to description logics. We show that the resolution procedure proposed
       by Tammet can be distributed amongst multiple resolution solvers by assigning
       unique sets of literals to individual solvers. This provides the basis for a highly
       scalable reasoning infrastructure for description logics.


1   Introduction
The use of description logics as one of the primary logical languages for knowledge rep-
resentation on the Web has created new challenges with respect to reasoning in these
logics. In order to be able to support the vision of a semantic web of interrelated ontolo-
gies, reasoning procedures have to be highly scalable and able to deal with physically
distributed knowledge models. A natural way of addressing these problems is to rely on
distributed inference procedures that can distribute the load between different solvers
thus reducing potential bottlenecks both in terms of memory and computation time.
Currently, almost all the work on description logic reasoning still assumes a central-
ized approach where the complete terminology has to be present on a single system
and all inference steps are are carried out on this system. Exceptions to this rule are
approaches like Distributed Description Logics (DDL) [2] that support local reasoning
at the price of sacrificing expressiveness in the links between local models and by drop-
ping some formal properties on the level of the overall model. In DDL for example,
certain types of inconsistencies are not propagated on the global level. The goal of our
work is to support local reasoning in description logics without the need to reduce the
expressiveness of links between local models. Based on results in resolution reasoning
      P air v Set                  {¬P air(x1 ), Set(x1 )}
      P air v ∀part.¬Set           {¬P air(x2 ), ¬part(x2 , x3 ), ¬Set(x3 )}
      Set v ∃part.Set              {¬Set(x4 ), part(x4 , f (x4 ))}, {¬Set(x4 ), Set(f (x4 ))}
      P air(a)                     {P air(a)}


Fig. 1. ALC example and equisatisfiable clauses. This example states that every pair is a set,
every part of a pair is not a set and every set has a part that is a set. For testing satisfiability of
the concept P air we additionally state that it has an instance a. The translation to FOL clauses
requires replacing the existencial quantified variable y with a skolem function.


for description logic [8], we present an algorithm that decides satisfiability of a set of
ALC ontologies. This algorithm allows us to provide distributed reasoning support on
sets of terminologies that share non-logical symbols without merging the modules. A
possible application is the provision of distributed reasoning support for (ALC equiva-
lent parts of) OWL ontologies linked by import statements. Our method guarantees that
the global semantics is preserved.

    The contribution of this paper is threefold: (1) We identify general requirements
for a resolution procedure needed to guaranteeing soundness and completeness. (2) we
show that the resolution method proposed in [8] satisfies these requirements. (3) we
present first results on the practical applicability of the algorithm. The paper is orga-
nized as follows: after a brief discussion of related work in the remainder of this sec-
tion, we first explain the general idea of distributing standard resolution and discuss
the problems that have to be solved for guaranteeing completeness. We then discuss or-
dered resolution as a adequate basis for distribution and its adaptation to ALC knowl-
edge bases before turning to the distributed ordered resolution algorithm. Finally, we
present the first results of this method and discuss possible optimizations.


1.1   Related Work

Modular DL Reasoning Current approaches to distributed reasoning on description
logic mostly rely on tableaux methods. Distribution by solving the two choices of non-
determinitic tableaux rules in parallel is difficult as it hampers the application of op-
timization and blocking strategies. Instead most distributed tableaux approaches try to
identify all possible conflicts, i.e. all axioms that might follow from another module and
would cause a contradiction and send these as queries to the other modules. So far, this
is only done for links with rather restricted expressiveness between the modules.
The most prominent actually distributed T-Box reasoning implementation for ontolo-
gies Distributed Description Logic (DDL) [2] supports only a special type of links
(called bridge rules) between ontologies. The local domains have to be disjoint, i.e.
there is no real subsumption between elements of different modules. Like DDL, E-
Connections [3] treat local domains as disjoint and do not support subsumption rela-
tions between modules.
Currently, a modular ontology framework based on Conservative Extensions [5] is
in development. This framework, however, imposes severe requirements to the self-
containement of the modules. Reasoning with these modules is relatively easy but the
computational difficulties are transfered to checking and maintaining the properties of
the modules. We think that this approach overshoots the mark and poses too many re-
strictions on the way axioms might be distributed across different sites.



Distributed Resolution Methods Resolution is used by all successful FOL provers.
Approaches to distributed first order reasoning are motivated by efficiency consider-
ations, performance is improved by using multiple processors in parallel. Roo[4], for
example, is a parallelization of the widely (re)used first order reasoner Otter.
Parallelization differs from the distribution setting, while the former usually uses a
shared memory, the latter constitutes a physical separation of modules and resulting
higher communication costs. Partition-Based Reasoning [1] is a distributed resolution
strategy that requires local reasoning to be complete for consequence finding. The re-
quirement inevitably causes derivation of much more clauses than necessary for refuta-
tion.


Resolution for Description Logics The main problem with applying FOL approaches
for DL is that termination is not guaranteed. Although DL is a decidable fragment of
FOL, causing a FOL reasoner to terminate on DL input requires considerable adaption
of the algorithm. This paper is based on resolution methods introduced to description
logics by [6] that decide satisfiability. The algorithm we present can be seen as a dis-
tributed version of the algorithm for deciding ALC satisfiability proposed there.


2   Distributed Standard Resolution

Analyzing the applicability of the Partition-Based Reasoning approach of [1] to descrip-
tion logic revealed that the only way to avoid redundant derivations is to perform the
same inferences as a common resolution reasoner instead of connecting a set of black-
box reasoners. Therefore, the basic idea proposed in this paper is to distribute common
standard resolution by allocating each derivation step to a unique site in the distributed
system.
We first define distributed resolution before detailing this essential property of a dis-
tributed resolution calculus.


Definition 1 (Standard Resolution). For clauses C ∨ A and D ∨ ¬B with literals A
and ¬B, standard resolution is defined by

                                          C ∨ A D ∨ ¬B
                             Resolution
                                             Cσ ∨ Dσ
where σ is the most general unifier of A and B.
A distributed calculus can be created from a given resolution calculus, based on an
allocation relation that determines the modules.
Definition 2 (Allocation). An allocation function for C is a total function a : C → M
that maps clauses to module identifiers.
The allocation could also be defined by an allocation relation, thus allowing for dupli-
cation of clauses to multiple modules. However, we will show that for ALC duplication
is not necessary and we will define an appropriate allocation function. Before we turn
to distributed resolution for ALC we give a general notation of distributed resolution.
Definition 3 (Distributed Resolution). A distributed resolution method for a set of
clauses C is a tuple (R, a) of a resolution calculus and an allocation function such that
for every inference with premises Pi ∈ C, i ∈ I the allocation restriction ∃m ∈ M, ∀i ∈
I : a(Pi ) = m is satisfied.
For rules with only one premise this restriction is trivially true. A trivial allocation is
obtained by allocating all clauses to the same module (a(c) = m1 ∀c ∈ C). Non-trivial
allocations constitute a seperation into modules.
The allocation restriction enables implementation of distributed resolution by connect-
ing a set of resolution reasoners, each with a local set of clauses:
 – Every module is saturated seperately.
 – Newly derived clauses are propagated according to the allocation relation.
Considering one of these reasoners, the difference to performing centralized resolution
is, that occasionally locally derived clauses are send to another reasoner and in return,
clauses are received from other reasoners and have to be integrated in the local clause
list. In addition, a global initialization and termination routine has to be added, that
starts the separate solvers and stops if one of them finds a proof or all are saturated.
Definition 4 (Local Resolution). The local calculus for module m ∈ M in a dis-
tributed resolution method (R, a) is R with additional restriction a(Pi ) = m for every
premise Pi to every rule r ∈ R.
In contrast to the centralized case, a module that is saturated locally may have to con-
tinue reasoning once a new clause is propagated from another module.
The allocation a was defined to be functional for guarateeing that inferences are never
duplicated. Most relevant are inferences with multiple premises because of the expen-
sive search for corresponding sets of premises. The allocation restrictions ensure that
every inference of the original calculus occurs in the distributed setting.
Corollary 1. If a calculus R guarantees completeness (termination respective) for in-
put ⊆ C than every distributed reasoning method (R, a) for C with a finite set M of
modules is complete (terminates).
Considering a derivation of the empty clause in centralized resolution that is not derived
in the corresponding distributed resolution method leads to a contradiction: Assume the
empty clause is not derived from an unsatisfiable KB by distributed ordered resolution
and consider the first inference in the corresponding centralized reasoner that does not
occur in the distributed version. By the allocation restriction, all premises are allocated
to some module m the inference is carried out by local resolution in that module.
Termination is obviously preserved, no inference is added by distribution.
2.1   Problems Distributing Resolution

In general, clauses might have to be allocated to more than one module to make sure
that the premises of every derivation step are allocated to the same module. In order to
achieve an efficient method, we have to avoid duplication and allocate every clause to
as few sites as possible. Even if we assume the case of only two modules where each
may use terms defined by the other, distributing standard resolution might cause most
of the axioms in inferences be copied to the other module. To make sure that every pair
of clauses that can be resolved with each other meets in a module, we need to send all
clauses containing a foreign term (i.e. a term defined in the other module) to the other
module. Unfortunately the resolvent of clauses from different modules is very likely to
contain terms from both modules that were only used locally before, thus it has to be
copied to both modules. If we extend the approach to multiple modules we can expect
the number of duplicates to further increase.

     The partition based approach [1] adresses this problem by limiting propagation to
clauses that contain only shared symbols. It defines the shared language between mod-
ules as the clauses consisting of symbols shared by a pair of modules. The system can
be depicted as a graph, with modules as nodes and edges for every shared language, la-
belled with the shared symbols. The graph is modified to form a tree by deleting edges
and adding the corresponding symbols to other edge labels. Modules that share a sym-
bol ”s” remain connected, there is still some path in the tree connecting the modules
with ”s” contained in the label of every edge on the path. While the edges in the graph
are undirected, edges in the resulting tree point towards the root.
Whenever a local reasoner derives an axiom that is contained in a shared language (i.e.
it contains only symbols from an outgoing edge), the axiom is propagated to the cor-
responding neighbor, towards the root. In difference to our approach clauses are not
allocated to a specific module but may be passed on from one to the other until they
reach the root. This approach faces some efficiency problems: First, as mentioned be-
fore, the system infers much more clauses than necessary for refutation, because the
local reasoning has to be complete for consequence finding to guarantee completeness.
If the modules are small (i.e. the knowledge base is distributed to many modules) this
effect is negligible, but then the communication cost are very high. Second, the compu-
tation and communication is not distributed equably among the modules. The root is a
distinguished module, communication load boosts towards it. Not only are the edges di-
rected towards the root, also the number of shared language symbols is larger on edges
close to the root. (Recall that all modules that share some symbol remain connected by
a path labelled with this symbol.) Third, the partitioning depends on the query. If the
query clauses are not contained in some local language, new links have to be introduced
and repartitioning might be necessary.

    Nevertheless, we will show that with a different approach it is possible to define a
distributed resolution method for ALC that is sound and complete, and terminates. It
does not require duplication of axioms, every clause is allocated to exactly one module.
New axioms (e.g. queries) can be easily added to the system. The approach is fairly
distributed, no module stands out from the others.
3     Ordered Resolution
The problem with applying first order methods for DL is loosing decidability. An DL-
ontology input to a complete first order reasoner will not terminate in general. [8]
showed that using ordered resolution, decidability of some description logics can be
preserved by transforming the ontology into a special normal form. Before we give the
definition of the method for ALC from [6] we define the resolution rules it is based on.
Ordered resolution depends on two parameters that can be modified to improve perfor-
mance e.g. for restricted subsets of first order logic like description logic. First parame-
ter is the order of literals (for defining the maximal literals) that can be defined on top of
an order of predicate symbols, function symbols and constants. The second parameter is
the selection function that maps every clause C to a subset S(C) of it’s negative literals.


Definition 5 (Ordered Resolution).
                                     C ∨ A D ∨ ¬B
             Ordered resolution
                                        Cσ ∨ Dσ
where
 1. σ is the most general unifier of A and B
 2. either B is selected in D ∨ ¬B or else nothing is selected in D ∨ ¬B and Bσ is
    maximal w.r.t. Dσ
 3. Aσ is strictly maximal with respect to Cσ
 4. nothing is selected in Cσ ∨ Aσ
Definition 6 (Positive Factoring).
                                    C ∨A∨B
              Positive Factoring
                                    Cσ ∨ Aσ
where
 1. σ is the most general unifier of A and B
 2. Aσ is maximal with respect to Cσ ∨ Bσ
 3. nothing is selected in Cσ ∨ Aσ ∨ Bσ
Compared to unrestricted resolution, many derivations are skipped in ordered resolu-
tion. A literal A that is not selected in a clause C can only be resolved upon if nothing
else is selected in the clause and Aσ is maximal. Intuitively, the effect of the selection
function is a change in the order of literals in a clause, the selected literals are moved
to the front. In addition, the combination of multiple inferences for skipping redundant
intermediate results (i.e. hyperresolution) is controlled by selection.

3.1   Adaption to ALC
We assume a ALC ontology transformed into first order clauses (consider Figure 1 for
an example). Modules are thus sets of clauses that can be derived from the given ontol-
ogy. Different ontologies that share terms (e.g. one ontologie uses a concept defined in
another ontology) can be considered as modules, too. The ontologies are translated into
clauses seperately resulting in the same structure as an ontology that is first clausified
and than partitioned into modules.

    For preserving decidability, the ontology has to be transformed into definitorial form
before clausification. Then, resolving the clauses with ordered resolution derives an
empty clause iff the ontology is inconsistent and terminates for any ontology. Intuitively,
the definitorial form can be seen as contrary of unfolding, it splits up nested axioms into
simple ones by introducing new concepts.
Definition 7 (Definitorial Form). The definitorial form of a concept C in negation
normal form is
            
              C                            if all subterms of C are literal concepts
 Def(C) =
              {¬Q u C|p } ∪ Def(C[Q]p ) else

where C|p is a concept that occurs in C (subterm of C), but not a literal concept and all
subterms of C|p are literal concepts. C[Q]p is the concept definition that results from
replacing C|p with Q.
Definition 8 (ALC Resolution). ALC resolution (RALC ) is the calculus with
  – rules ordered resolution and factoring,
  – selection of exactly the negative binary literals, and
  – literal ordering  with R(x, f (x))  ¬C(x) and D(f (x))  ¬C(x), for all
    function symbols f , and predicates R, C, and D.
The requirement to the literal ordering is satisfied by every lexicographic path ordering
based on a total precedence > with f > P > ¬ for every function symbol f and
predicate symbol P .
Definition 9 (Lexicographic Path Ordering). A lexicographic path ordering (LPO) is
a term ordering induced by a well-founded strict precedence > over function, predicate
and logical symbols, defined by:
s = f (s1 , . . . , sm )  g(t1 , . . . , tn ) = t if and only if
at least one of the following holds
  (i) f > g and s  ti for all i with 1 ≤ i ≤ n
 (ii) f = g and for some j we have (s1 , . . . , sj−1 ) = (t1 , . . . , tj−! ), sj  tj and s  tk
      for all k with j < k ≤ n
(iii) sj  t for some j with 1 ≤ j ≤ m
Theorem 1 (RDL complexity).
For an ALC knowledge base KB, clausifying its definitorial form and saturating the
clauses by RALC decides satisfiability of KB and runs in time exponential in KB.
This theorem is the reason for using ordered resolution in our approach, we thereby
guarantee termination. Note that ordered resolution is not complete for consequence
finding and hence, applying it for the partition based reasoning approach of [1] results
in an incomplete procedure.
4     Distributed Ordered Resolution
Inuitively, the idea for distributing resolution is to allocate clauses based on their re-
solvable literals.
    – Every module ”hosts” a subset of the literals, i.e. it is responsible for all inferences
      resolving upon one of these literals.
    – Every (derived or stated) clause is moved to the modules that host a resolvable
      literal of the clause.
Essential for distributing resolution without duplicating clauses and derivations is to
reduce the number of modules that may be responsible for the next derivation to a
single module i.e. to define the resolution method in such a way that there is a unique
resolvable literal for every clause. Note that literals can be allocated to modules based
on their top symbol (i.e. the predicate P of the literal (¬)P (t) where t is a term) and a
partitioning of symbols that allocates every symbol to exactly one module.

4.1     Resolvable Literals of a Given Clause
For defining a resolution method that is complete for this communication strategy, we
investigate the options for using a given clause in an inference. The first thing we dis-
cover from the definition of ordered resolution is that a clause can only be main premise
and not side premise if some literal is selected by the selection function. By exploiting
the other requirements we obtain the decision tree in Figure 2 (assuming the decision
on the resolvable literal is not affected by unification).




          Fig. 2. Clause decision tree depicting the resolution options for a given clause.



    If nothing is selected, only a maximal literal can be resolved. If there are multiple
maximal literals, then the clause cannot be a side premise. Problematic are multiple
negative maximal literals because any of them can be the resolved literal in the next
derivation. If all multiple maximal literals are positive the clause cannot be premise
for ordered resolution because it can neither be a side premise nor a main premise. A
given clause can never be side premise and main premise at same time (in different
inferences). If the selection function is empty, the first branch on the right is pruned.
If the order is total on all literals, the branch in the middle is pruned. By exploiting
the effects of the selection and ordering on the clause tree, two problematic cases are
identified:

 ? Multiple selected literals: If literals from different moduls are selected, we cannot
    allocate the inference to one module.
 ?? Multiple negative maximal literals, none selected: We have to send the clause to
    every module that is responsible for derivations resolving upon one of these literals.

Note that these restrictions apply to the unified clause: The definition of ordered resolu-
tion requires each resolved atom Aσ to be (strictly) maximal with respect to Dσ (Cσ)
but this does in general not imply that A is (strictly) maximal w.r.t. the premise before
substitution. However, the ordering defined for RALC is admissible which implies in-
variance under substitutions and the selection function is independent of substitutions,
too. For guaranteeing termination of RALC , ordering and selection were defined in a
way that restricts the clauses to five different types [6]:



                            clause type         type of resolvable literal
                   1    R(x, f (x)) ∨ P(x)             R(x, f (x))
                   2a          P(x)                     (¬)P (x)
                   2b   P1 (f (x)) ∨ P2 (x)           (¬)P (f (x))
                   3 ¬R(x, y) ∨ P1 (x) ∨ P2 (y)         ¬R(x, y)
                   4           P(a)                     (¬)P (a)
                   5        (¬)R(a, b)                 (¬)R(a, b)


Table 1. ALC clauses. For a term t, P(t) denotes a possibly empty disjunction of the
form (¬)P1 (t) . . . (¬)Pn (t). P(f (x)) denotes a possibly empty disjunction of the form
P1 (f1 (x)) . . . Pm (fm (x)). Note that this definition allows each Pi (fi (x)) to contain positive
and negative literals. We choose a slightly different notation of the ALC clauses with the type 2
clauses subdivided because the resolvable literal is of type P (f (x)) if one exists and otherwise
P (x).




    Obviously, ALC clauses never contain more than one selected literal, thus, the first
problem does not occur in RALC . The second problem can be avoided, too, by either
applying a total order on literals or selecting one of the multiple maximal literals. The
effect is similar because considering the possible inferences, a selected literal is treated
like a strictly maximal literal. We choose the former and extend the ordering  defined
for RALC as follows.
Definition 10 (Ordering for Distributed Resolution).
For two literals of ALC clauses A and B, A D B iff A  B or A and B are
incomparable wrt.  and i) A contains a function symbol but not B or ii) A and B
                      Module A                                       Module B
    (1) Set(f (x4 )), ¬Set(x4 )                    ¬part(x2 , x3 ), ¬Set(x3 ), ¬P air(x2 ) (1)
    (2) Set(x1 ), ¬P air(x1 )                                 part(x4 , f (x4 )), ¬Set(x4 ) (2)
                                                                                   P air(a) (3)

    (3) ¬Set(f (x4 )), ¬Set(x4 ), ¬P air(x4 )←¬Set(f (x4 )), ¬Set(x4 ), ¬P air(x4 ) (4 1+2 )
(1+3 4) ¬Set(x4 ), ¬Set(x4 ), ¬P air(x4 )
(2+4 5) ¬P air(x4 )                          →                          ¬P air(x4 ) (5)
                                                                                 (6 3+5 )


Fig. 3. Example of Distributed Ordered Resolution. The example from Figure 1 is distributed to
two modules. Module A hosts literals with top predicate Set, B hosts literals with top predicate
part or P air. Stated clauses are noted above the horizontal line, below (4 1+3 ) means clause
(4) was derived from clauses (1) and (2). The first literal is the resolvable literal of a clause (let
Set > P air). Arrows depict propagation of clauses, we ommit braces for purpose of readability.
Note that in this example most clauses contain symbols hosted by the other module. In realistic
examples this overlap is very small and we can expect many of the clauses to be kept in the
module where they were infered.


contain the same number of function symbols and the top predicate of A precedes the
top predicate of B.
                                                             D
The calculus RALC with extended ordering D is denoted by RALC   .
The literals that remain incomparable wrt. D have the same top predicate and are
therefore hosted by the same module.
Corollary 2 (Distributed Ordered Resolution).
                                                    D
Distributed Ordered Resolution, i.e. the method (RALC    , a) with allocation a defined by
a(c) = part(topSymbol(resolvableLiteral(c))) based on a partitioning of predicates
part that maps every predicate to a module identifier is a distributed resolution method.
Since the resolvable literal is unique, the allocation is a function. The allocation restric-
tions are satisfied because for any pair of clauses that are premises of an inference the
resolvable literals have the same top symbol and are thus allocated to the same module.
            D                                                                        D
Hence, if RALC    is complete and terminates, by Corollary 1 the same holds for (RALC     , a).
Theorem 2 (Completeness and Termination of Distributed Ordered Resolution).
Distributed Ordered Resolution is a complete and terminating method for deciding
ALC satisfiability.

Proof. The theorem follows from completeness and termination of RALC (Theorem 1).
                          D
We have to show that RALC      is complete and terminates. The definition of ordered reso-
                                                            D
lution allows for an arbitrary ordering of literals, hence RALC is complete. We only have
to make sure that Theorem 1 is not affected by this modification and hence termination
is preserved. In fact, the requirements R(x, f (x)) D ¬C(x) and D(f (x)) D ¬C(x)
still hold for all function symbols f , and predicates R, C, and D.

Consider Figure 2 for an example refutation.
5     Experiments

The algorithm presented above is not yet implemented in an actually distributed rea-
soner, but first experiments were conducted simulating the distributed setting. For in-
vestigating distributed ordered resolution we implemented the definitorial form nor-
                                                                              D
malization and modified the resolution reasoner S PASS to perform the RALC          resolu-
tion. The crucial point for determining performance of distributed ordered resolution is
the number of propagated clauses. If the modules happen to be completely independent,
performance would be improved by a factor propotional to the number of modules. But,
in realistic settings the modules will allways share some symbols and clauses will be
exchanged between them. Propagation of a derived clause to another module is consid-
erably more expensive then just adding it to the local clause list. Apart from the distance
that has to be bridged over, a received clause requires modification of the local index
structures that are used to manage the clause lists. Hence, the ratio of derived clauses
that have to be propagated to another module is the dicisive criterion.



5.1     Setting

For our experiments we used the FOODSWAP1 ontology. First we transformed the on-
tology into its definitorial form. Then we used our partitioning tool PATO for assigning
each term (i.e. concept or property name) a module number. PATO first builds a depen-
dency graph from the ontology that is afterwards partitioned into parts that are stronger
internally connected than extenally connected2 . For minimizing the communication we
configured Pato to create a link in the dependency graph for every pair of terms ap-
pearing in the same axiom. Finally we started the S PASS reasoner with appropriate
configuration to check consistency of the ontology. For every ordered resolution deriva-
tion we recorded the module that performed the derivation (i.e. the module number of
the resolved literal) and the module to which the derived clause is send (i.e. the module
number of the resolvable literal of the derived clause).



5.2     Results

Analysing the list of source and destination of every derived clause, we found that out of
the 229 derived clauses 110 were kept in the same module, the other 119 clauses were
propagated to another module. The complete picture of the communication is depicted
in Figure 4. The communication load of 52% of the derived clauses may already be
acceptable in many applications, but still further reduction can be achieved by merging
modules as depicted in the subsequent graphs. In the three module setting only 28% of
the clauses are propagated and 26% for the two modules while the sizes remain bal-
anced.

 1
     http://www.mindswap.org/dav/ontologies/commonsense/food/foodswap.owl
 2
     A detailed description of PATO can be found in [7]
                      52%                              28%                   26%


Fig. 4. Communication between modules for the satisfiability test. Modules are denoted by circles
with the size (i.e. number of terms) of the module denoted inside. Edges are labelled with the
number of clauses propagated between the corresponding modules, propagation direction is not
distinguished for purpose of readability. The numbers below the graphs give the percentage of
derived clauses that have to be propagated.


    Positive tests like this consistency check require complete saturation whereas neg-
ative test are often answered faster by resolution reasoners. For our small test case the
time required for reasoning is dominated by the time for loading the input, but the num-
ber of derived clauses is smaller for positive subsumption checks (i.e. derivation of the
empty clause). E.g. testing the subsumption (meat v f ood) by adding its converse
(meat ∧ ¬f ood) to the clauses required only 45 derivations (with 47% propagation in
the four module setting). In contrary, the negative subsumption test (eggs v dairy) is
similar to the positive consistency check (226 derivations, 115 propagations).



6     Optimizations
The first results of distributed ordered resolution are very encouraging. However, ad-
ditional tests with different larger ontologies are required and further reduction of the
propagation ratio is desirable.

6.1   Partitioning
One of the next steps for improving performance will be to optimize the partitioning wrt.
communication. In our experiment we started with a partitioning generated by PATO
and than reduced communication by merging modules. In the same way we could start
with all modules containing only a single term and find the optimal partitioning wrt. a
given reasoning task. Finding the optimal partitioning is an apparently hard problem,
but approximations will probably meet the needs.

6.2   Selection Function
Another topic of investigation is the selection function. Up to now, it selects at most one
literal. Thus, a derived clause is very likely to be moved to another module. It would
be more efficient to select local literals first. Moreover, if we select multiple literals,
we can combine multiple inferences to a hyperresolution inference and skip redundant
intermediate results. By investigating the requirements necessary to make sure the set
of ALC clauses is closed under RALC , we can improve the selection function:


Definition 11 (Adapted Distributed Selection).

 1. Select nothing from clauses of type 1 and 5.
 2. From type 3 clauses select the negative binary literal.
 3. From clauses of type 2b select the maximal number of negative literals of type
    P (f (x)) hosted by a single module.
 4. From clauses of type 2a and 4 select the maximal number of negative literals hosted
    by a single module.

We may not select all local literals but selecting all literals hosted by one module has
a similar effect. First the clause is moved to that module, but then all local literals are
solved in the next inference.



6.3   Reduction

For efficient resolution reasoning it is not sufficient to avoid unneccessary derivations.
Additionally, reduction techniques have to be applied that delete clauses if they are
tautologies or subsumed by other clauses. Deleting newly derived clauses that are sub-
sumed is forward subsumption, backward subsumption refers to the deletion of clauses
that are subsumed by a newly derived clause. The naı̈ve approach to distributed sub-
sumption would be to propagate new clauses to all modules that host one of its literals
for forward subsumption checks and send it to modules that are responsible for literals
equal or smaller than the resolvable literal of the new clause for backward subsumtion.
Fortunately we can again take advantage of the clause typology. Of all different clause
types only type 2a can be subsumed by a clause of different type, every other clause
can only be subsumed by a clause of the same type. Thus, by indexing the clauses with
their types we can skip a large part of the subsumption tests.


7     Summary

We addressed the problem of improving the scalability of description logic reasoning
by proposing a distributed resolution method for ALC terminologies. The algorithm
extends ordered resolution with a method for assigning clauses to a unique location at
which all possible resolution steps are executed by a local solver. We analyzed the prop-
erties of such a distributed method and identified necessary conditions for guaranteeing
completeness and termination of the algorithm. Further, we have shown that the resolu-
tion method for ALC described by Motik satisfies these conditions and can therefore be
distributed across different distributed reasoners. Our investigations lay the foundation
for the implementation of a large scale reasoning infrastructure for ALC terminologies
and can be seen as a first step towards supporting the vision of the semantic web as
a distributed system of interlinked ontologies that can be reasoned upon. First experi-
ments on the performance of the proposed algorithm are promising but more detailed
investigation is necessary. In particular, we have to find a way to control the cost of
communication between local reasoners. The major task in this context is the definition
of a suitable distribution strategy based on the nature of the terminologies involved.
Further, we have to investigate possible optimizations of the reasoning method some
of which are already mentioned in Section 6. Another major issue is the development
of distributed reasoning methods for more expressive languages – the final goal is to
support reasoning in OWL which is known to be equivalent to SHOIN (D). A direct
extension of the approach proposed here is not possible because dealing with number
restrictions requires a different resolution approach, i.e. the use of paramodulation. In
this context an assignment of clauses to a certain reasoner is not feasible, duplication
not be completely avoided. An adaption of our approach with limited duplication to a
calculus that supports equality is the focus of future work.


Acknowledgement

This work was partially supported by the German Science Foundation in the Emmy-
Noether Program under contract Stu 266/3-1.


References
1. E. Amir and S. McIlraith. Partition-based logical reasoning for first-order and propositional
   theories. Artificial Intelligence, 162(1-2):49–88, 2005.
2. A. Borgida and L. Serafini. Distributed description logics: Assimilating information from peer
   sources. Journal of Data Semantics, 1:153–184, 2003.
3. Bernardo Cuenca Grau, Bijan Parsia, and Evren Sirin. Combining owl ontologies using e-
   connections. Journal Of Web Semantics, 4(1), 2005.
4. Ewing L. Lusk, William W. McCune, and John Slaney. Roo: a parallel theorem prover. In In
   Proceedings of the 11th CADE, volume 607 of LNAI, pages 731–734. Springer Verlag, 1992.
5. C. Lutz, D. Walther, and F. Wolter. Conservative extensions in expressive description logics. In
   Proceedings of the Twentieth International Joint Conference on Artificial Intelligence IJCAI-
   07. AAAI Press, 2007.
6. Boris Motik. Reasoning in Description Logics using Resolution and Deductive Databases.
   PhD thesis, Universität Karlsruhe (TH), Karlsruhe, Germany, January 2006.
7. Heiner Stuckenschmidt and Michel Klein. Structure-based partitioning of large concept hi-
   erarchies. In Sheila A. McIlraith, Dimitris Plexousakis, and Frank van Harmelen, editors,
   Proceedings of the Third International Semantic Web Conference (ISWC 2004), pages 289–
   303, Hiroshima, Japan, nov 2004.
8. Tanel Tammet. Resolution methods for Decision Problems and Finite Model Building. PhD
   thesis, Chalmers University of Technology and University of Göteborg, 1992.